Dalam proyek ini, akan dibagi menjadi 4 tahap:
# import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
from sklearn.metrics import mean_absolute_error
from scipy.signal import savgol_filter
# setting
# display max columns
pd.set_option('display.max_columns', None)
Data banjir berasal dari PATRIOT-Net: https://patriotnet.id/
Data cuaca berasal dari Visual Crossing: https://www.visualcrossing.com/
Lokasi acuan: Batu Gadang, Padang. Long,lat: -0.957210, 100.478977
Rentang data: 03/06/2022 - 12/11/2022
# read data
b401_raw = pd.read_csv('dataset/dataset_raw/sensor_B-401.csv') # data banjir awal yang masih kotor
b401_cleaned = pd.read_csv('dataset/dataset_raw/sensor_B-401_cleaned.csv') # data banjir setelah menghilangkan anomali
data_cuaca = pd.read_csv('dataset/dataset_raw/data_cuaca.csv') # data cuaca
Data ini merupakan data asli yang diambil dari PATRIOT-Net
b401_raw
| Date | Day | Battery | Height | |
|---|---|---|---|---|
| 0 | 03/06/2022 00:02 | 03/06/2022 | 12.42 | 18.13 |
| 1 | 03/06/2022 00:12 | 03/06/2022 | 12.42 | 6.00 |
| 2 | 03/06/2022 00:23 | 03/06/2022 | 12.42 | 24.20 |
| 3 | 03/06/2022 00:26 | 03/06/2022 | 12.42 | 198.18 |
| 4 | 03/06/2022 00:33 | 03/06/2022 | 12.42 | -339.94 |
| ... | ... | ... | ... | ... |
| 19517 | 12/11/2022 23:34 | 12/11/2022 | 11.27 | 117.26 |
| 19518 | 12/11/2022 23:41 | 12/11/2022 | 11.24 | -400.63 |
| 19519 | 12/11/2022 23:45 | 12/11/2022 | 11.24 | -248.90 |
| 19520 | 12/11/2022 23:48 | 12/11/2022 | 11.24 | 99.05 |
| 19521 | 12/11/2022 23:55 | 12/11/2022 | 11.21 | 113.21 |
19522 rows × 4 columns
b401_raw.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 19522 entries, 0 to 19521 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 19522 non-null object 1 Day 19522 non-null object 2 Battery 19522 non-null float64 3 Height 19522 non-null float64 dtypes: float64(2), object(2) memory usage: 610.2+ KB
b401_raw.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| Battery | 19522.0 | 11.664755 | 0.398244 | 10.81 | 11.41 | 11.61 | 11.86 | 13.10 |
| Height | 19522.0 | 13.539526 | 145.665317 | -402.65 | 30.27 | 48.48 | 80.85 | 451.06 |
Pada kolom 'Height' terdapat anomali yaitu nilai minus dan nilai max yang sangat tinggi, anomali ini akan dihilangkan.
# bentuk data
plt.figure(figsize=(15, 5))
plt.plot(b401_raw['Height'], '-')
plt.xlabel('Index', fontsize= 12)
plt.ylabel('Height (cm)', fontsize= 12)
Text(0, 0.5, 'Height (cm)')
# bentuk data (sample)
plt.figure(figsize=(15, 5))
plt.plot(b401_raw['Height'][5000:6000], '-')
plt.xlabel('Index', fontsize= 12)
plt.ylabel('Height (cm)', fontsize= 12)
Text(0, 0.5, 'Height (cm)')
Data ini telah melewati pembersihan anomali
b401_cleaned
| Date | Day | Battery | Height | |
|---|---|---|---|---|
| 0 | 03/06/2022 00:02 | 03/06/2022 | 12.42 | 24.20 |
| 1 | 03/06/2022 00:12 | 03/06/2022 | 12.42 | NaN |
| 2 | 03/06/2022 00:23 | 03/06/2022 | 12.42 | 24.20 |
| 3 | 03/06/2022 00:26 | 03/06/2022 | 12.42 | NaN |
| 4 | 03/06/2022 00:33 | 03/06/2022 | 12.42 | NaN |
| ... | ... | ... | ... | ... |
| 19517 | 12/11/2022 23:34 | 12/11/2022 | 11.27 | 117.26 |
| 19518 | 12/11/2022 23:41 | 12/11/2022 | 11.24 | NaN |
| 19519 | 12/11/2022 23:45 | 12/11/2022 | 11.24 | NaN |
| 19520 | 12/11/2022 23:48 | 12/11/2022 | 11.24 | NaN |
| 19521 | 12/11/2022 23:55 | 12/11/2022 | 11.21 | 113.21 |
19522 rows × 4 columns
b401_cleaned.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 19522 entries, 0 to 19521 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 19522 non-null object 1 Day 19522 non-null object 2 Battery 19522 non-null float64 3 Height 14200 non-null float64 dtypes: float64(2), object(2) memory usage: 610.2+ KB
b401_cleaned.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| Battery | 19522.0 | 11.664755 | 0.398244 | 10.81 | 11.41 | 11.61 | 11.86 | 13.10 |
| Height | 14200.0 | 65.333808 | 34.203176 | 18.13 | 38.36 | 54.55 | 88.94 | 250.78 |
# bentuk data
plt.figure(figsize=(15, 5))
plt.plot(b401_cleaned['Height'], '-')
plt.xlabel('Index', fontsize= 12)
plt.ylabel('Height (cm)', fontsize= 12)
Text(0, 0.5, 'Height (cm)')
# bentuk sebaran data
plt.figure(figsize=(15, 5))
plt.plot(b401_cleaned['Height'][5000:6000], '-')
plt.xlabel('Index', fontsize= 12)
plt.ylabel('Height (cm)', fontsize= 12)
Text(0, 0.5, 'Height (cm)')
# missing data
print('Total jumlah data:', len(b401_cleaned))
print('Jumlah missing data:', b401_cleaned.Height.isnull().sum())
print('Persentase jumlah missing data:', (b401_cleaned.Height.isnull().sum()*100/len(b401_cleaned)).round(2), '%')
Total jumlah data: 19522 Jumlah missing data: 5322 Persentase jumlah missing data: 27.26 %
Terlihat anomali pada data telah dihilangkan sehingga terdapat beberapa missing value, missing value akan diisi dengan metode data imputation (interpolasi dan regresi)
Note:
Terdapat missing value akibat timeskip karena sensor tidak aktif
(akan ditangani dengan model machine learning menggunakan gabungan data banjir dan cuaca, variabel yang berasal dari data cuaca digunakan untuk memprediksi variabel 'height')
Interval waktu pada data banjir tidak sama
(akan di-resample agar interval sama, karena prediksi menggunakan timeseries membutuhkan data dengan interval yang sama)
data_cuaca
| name | datetime | temp | feelslike | dew | humidity | precip | precipprob | preciptype | snow | snowdepth | windgust | windspeed | winddir | sealevelpressure | cloudcover | visibility | solarradiation | solarenergy | uvindex | severerisk | conditions | icon | stations | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -0.957210, 100.478977 | 2022-06-03T00:00:00 | 18.7 | 18.7 | 17.6 | 93.32 | 0.0 | 0 | NaN | 0 | 0 | 3.6 | 4.7 | 68.5 | 1014 | 100.0 | 24.1 | 0 | NaN | 0 | 10 | Overcast | cloudy | remote |
| 1 | -0.957210, 100.478977 | 2022-06-03T01:00:00 | 18.8 | 18.8 | 17.5 | 92.16 | 0.0 | 0 | NaN | 0 | 0 | 4.0 | 5.4 | 73.8 | 1013 | 100.0 | 24.1 | 0 | NaN | 0 | 10 | Overcast | cloudy | remote |
| 2 | -0.957210, 100.478977 | 2022-06-03T02:00:00 | 18.9 | 18.9 | 17.4 | 91.01 | 0.0 | 0 | NaN | 0 | 0 | 4.0 | 5.4 | 73.9 | 1012 | 100.0 | 24.1 | 0 | NaN | 0 | 10 | Overcast | cloudy | remote |
| 3 | -0.957210, 100.478977 | 2022-06-03T03:00:00 | 19.1 | 19.1 | 17.4 | 89.88 | 0.0 | 0 | NaN | 0 | 0 | 4.3 | 6.1 | 80.2 | 1012 | 100.0 | 24.1 | 0 | NaN | 0 | 10 | Overcast | cloudy | remote |
| 4 | -0.957210, 100.478977 | 2022-06-03T04:00:00 | 19.4 | 19.4 | 17.3 | 87.66 | 0.0 | 0 | NaN | 0 | 0 | 4.0 | 5.0 | 76.7 | 1011 | 98.9 | 24.1 | 0 | NaN | 0 | 10 | Overcast | cloudy | remote |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3907 | -0.957210, 100.478977 | 2022-11-12T19:00:00 | 20.6 | 20.6 | 20.5 | 99.39 | 0.0 | 0 | NaN | 0 | 0 | 1.4 | 1.4 | 115.4 | 1010 | 100.0 | 1.9 | 0 | NaN | 0 | 10 | Overcast | cloudy | remote |
| 3908 | -0.957210, 100.478977 | 2022-11-12T20:00:00 | 20.6 | 20.6 | 20.4 | 98.77 | 0.0 | 0 | rain | 0 | 0 | 1.1 | 1.4 | 92.5 | 1011 | 100.0 | 1.3 | 0 | NaN | 0 | 10 | Overcast | cloudy | remote |
| 3909 | -0.957210, 100.478977 | 2022-11-12T21:00:00 | 20.5 | 20.5 | 20.2 | 98.16 | 0.1 | 100 | rain | 0 | 0 | 1.1 | 1.4 | 83.5 | 1012 | 100.0 | 1.7 | 0 | NaN | 0 | 10 | Rain, Overcast | rain | remote |
| 3910 | -0.957210, 100.478977 | 2022-11-12T22:00:00 | 20.4 | 20.4 | 20.1 | 98.16 | 0.1 | 100 | rain | 0 | 0 | 1.8 | 1.8 | 96.7 | 1012 | 100.0 | 1.7 | 0 | NaN | 0 | 10 | Rain, Overcast | rain | remote |
| 3911 | -0.957210, 100.478977 | 2022-11-12T23:00:00 | 20.2 | 20.2 | 20.0 | 98.77 | 0.3 | 100 | rain | 0 | 0 | 2.2 | 2.2 | 109.1 | 1012 | 100.0 | 1.7 | 0 | NaN | 0 | 10 | Rain, Overcast | rain | remote |
3912 rows × 24 columns
data_cuaca.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3912 entries, 0 to 3911 Data columns (total 24 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 3912 non-null object 1 datetime 3912 non-null object 2 temp 3912 non-null float64 3 feelslike 3912 non-null float64 4 dew 3912 non-null float64 5 humidity 3912 non-null float64 6 precip 3912 non-null float64 7 precipprob 3912 non-null int64 8 preciptype 1929 non-null object 9 snow 3912 non-null int64 10 snowdepth 3912 non-null int64 11 windgust 3912 non-null float64 12 windspeed 3912 non-null float64 13 winddir 3912 non-null float64 14 sealevelpressure 3912 non-null int64 15 cloudcover 3912 non-null float64 16 visibility 3912 non-null float64 17 solarradiation 3912 non-null int64 18 solarenergy 2066 non-null float64 19 uvindex 3912 non-null int64 20 severerisk 3912 non-null int64 21 conditions 3912 non-null object 22 icon 3912 non-null object 23 stations 3912 non-null object dtypes: float64(11), int64(7), object(6) memory usage: 733.6+ KB
data_cuaca.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| temp | 3912.0 | 21.656570 | 2.629999 | 17.50 | 19.6000 | 20.50 | 23.70 | 29.8 |
| feelslike | 3912.0 | 21.714545 | 2.768646 | 17.50 | 19.6000 | 20.50 | 23.70 | 31.0 |
| dew | 3912.0 | 19.306186 | 1.374440 | 13.90 | 18.5000 | 19.30 | 20.20 | 23.2 |
| humidity | 3912.0 | 87.563298 | 12.125224 | 45.86 | 81.6025 | 92.82 | 96.41 | 100.0 |
| precip | 3912.0 | 0.403655 | 1.427023 | 0.00 | 0.0000 | 0.00 | 0.30 | 33.6 |
| precipprob | 3912.0 | 43.507157 | 49.582975 | 0.00 | 0.0000 | 0.00 | 100.00 | 100.0 |
| snow | 3912.0 | 0.000000 | 0.000000 | 0.00 | 0.0000 | 0.00 | 0.00 | 0.0 |
| snowdepth | 3912.0 | 0.000000 | 0.000000 | 0.00 | 0.0000 | 0.00 | 0.00 | 0.0 |
| windgust | 3912.0 | 4.759509 | 2.367873 | 0.40 | 3.2000 | 4.30 | 6.10 | 26.3 |
| windspeed | 3912.0 | 4.374872 | 1.997276 | 0.00 | 2.9000 | 4.30 | 5.80 | 11.5 |
| winddir | 3912.0 | 159.286682 | 96.700031 | 0.20 | 77.0000 | 102.20 | 255.50 | 359.8 |
| sealevelpressure | 3912.0 | 1011.591513 | 1.718880 | 1007.00 | 1010.0000 | 1012.00 | 1013.00 | 1017.0 |
| cloudcover | 3912.0 | 83.683333 | 26.813950 | 1.20 | 77.5000 | 99.40 | 100.00 | 100.0 |
| visibility | 3912.0 | 18.617357 | 8.085219 | 0.10 | 13.3000 | 24.10 | 24.10 | 24.1 |
| solarradiation | 3912.0 | 203.138548 | 295.852175 | 0.00 | 0.0000 | 5.00 | 380.00 | 1031.0 |
| solarenergy | 2066.0 | 1.385092 | 1.117499 | 0.00 | 0.3000 | 1.20 | 2.40 | 3.7 |
| uvindex | 3912.0 | 2.020450 | 2.965857 | 0.00 | 0.0000 | 0.00 | 4.00 | 10.0 |
| severerisk | 3912.0 | 12.753579 | 9.185640 | 3.00 | 10.0000 | 10.00 | 10.00 | 75.0 |
Tidak ada yang aneh seperti anomali pada data cuaca
Note:
preciptype dan solarenergyb401 = b401_cleaned.copy()
b401.head()
| Date | Day | Battery | Height | |
|---|---|---|---|---|
| 0 | 03/06/2022 00:02 | 03/06/2022 | 12.42 | 24.2 |
| 1 | 03/06/2022 00:12 | 03/06/2022 | 12.42 | NaN |
| 2 | 03/06/2022 00:23 | 03/06/2022 | 12.42 | 24.2 |
| 3 | 03/06/2022 00:26 | 03/06/2022 | 12.42 | NaN |
| 4 | 03/06/2022 00:33 | 03/06/2022 | 12.42 | NaN |
# membuang kolom yang tidak perlu
b401 = b401.drop(columns = ['Day', 'Battery'])
# mengganti nama kolom
b401.columns = ['date', 'height']
# menyesuaikan format date
b401['date'] = b401['date'] + ':00'
b401['date'] = pd.to_datetime(b401['date'], format='%d/%m/%Y %H:%M:%S')
# mengisi missing value dari anomali dengan interpolasi
print('(Sebelum interpolasi) --- Jumlah missing value: ', b401.height.isnull().sum())
b401['height'] = b401['height'].fillna(b401['height'].interpolate()).round(2)
print('(Setelah interpolasi) --- Jumlah missing value: ', b401.height.isnull().sum())
(Sebelum interpolasi) --- Jumlah missing value: 5322 (Setelah interpolasi) --- Jumlah missing value: 0
# bentuk sebaran sample data
plt.figure(figsize=(15, 5))
plt.plot(b401['height'][5000:6000], '-')
plt.xlabel('Index', fontsize= 12)
plt.ylabel('Height (cm)', fontsize= 12)
Text(0, 0.5, 'Height (cm)')
# resample 10 menit
print('\n(Sebelum resample) --- Jumlah baris: ', len(b401))
b401 = b401.set_index('date')
b401 = b401.resample('10min').mean().round(2)
print('(Setelah resample) --- Jumlah baris: ', len(b401))
print('(Setelah resample) --- Jumlah missing value: ', b401.height.isnull().sum())
print('(Setelah resample) --- Persentase jumlah missing value:', (b401.height.isnull().sum()*100/len(b401)).round(2), '%')
# reset index
b401 = b401.reset_index()
(Sebelum resample) --- Jumlah baris: 19522 (Setelah resample) --- Jumlah baris: 23472 (Setelah resample) --- Jumlah missing value: 8376 (Setelah resample) --- Persentase jumlah missing value: 35.69 %
# bentuk data setelah resample
plt.figure(figsize=(15, 5))
plt.plot(b401['height'][5000:6000], '-')
plt.xlabel('Index', fontsize= 12)
plt.ylabel('Height (cm)', fontsize= 12)
Text(0, 0.5, 'Height (cm)')
# missing data
print('Total jumlah data:', len(b401))
print('Jumlah missing data:', b401.height.isnull().sum())
print('Persentase jumlah missing data:', (b401.height.isnull().sum() * 100 / len(b401)).round(2), '%')
Total jumlah data: 23472 Jumlah missing data: 8376 Persentase jumlah missing data: 35.69 %
Terlihat ada missing value setelah datetime di-resample karena adanya timeskip pada data akibat sensor tidak aktif dalam kurun waktu tertentu
cuaca = data_cuaca.copy()
cuaca.head()
| name | datetime | temp | feelslike | dew | humidity | precip | precipprob | preciptype | snow | snowdepth | windgust | windspeed | winddir | sealevelpressure | cloudcover | visibility | solarradiation | solarenergy | uvindex | severerisk | conditions | icon | stations | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -0.957210, 100.478977 | 2022-06-03T00:00:00 | 18.7 | 18.7 | 17.6 | 93.32 | 0.0 | 0 | NaN | 0 | 0 | 3.6 | 4.7 | 68.5 | 1014 | 100.0 | 24.1 | 0 | NaN | 0 | 10 | Overcast | cloudy | remote |
| 1 | -0.957210, 100.478977 | 2022-06-03T01:00:00 | 18.8 | 18.8 | 17.5 | 92.16 | 0.0 | 0 | NaN | 0 | 0 | 4.0 | 5.4 | 73.8 | 1013 | 100.0 | 24.1 | 0 | NaN | 0 | 10 | Overcast | cloudy | remote |
| 2 | -0.957210, 100.478977 | 2022-06-03T02:00:00 | 18.9 | 18.9 | 17.4 | 91.01 | 0.0 | 0 | NaN | 0 | 0 | 4.0 | 5.4 | 73.9 | 1012 | 100.0 | 24.1 | 0 | NaN | 0 | 10 | Overcast | cloudy | remote |
| 3 | -0.957210, 100.478977 | 2022-06-03T03:00:00 | 19.1 | 19.1 | 17.4 | 89.88 | 0.0 | 0 | NaN | 0 | 0 | 4.3 | 6.1 | 80.2 | 1012 | 100.0 | 24.1 | 0 | NaN | 0 | 10 | Overcast | cloudy | remote |
| 4 | -0.957210, 100.478977 | 2022-06-03T04:00:00 | 19.4 | 19.4 | 17.3 | 87.66 | 0.0 | 0 | NaN | 0 | 0 | 4.0 | 5.0 | 76.7 | 1011 | 98.9 | 24.1 | 0 | NaN | 0 | 10 | Overcast | cloudy | remote |
# membuang kolom yang tidak perlu
cuaca = cuaca.drop(columns = ['name', 'snow', 'snowdepth', 'stations'])
# menyesuaikan format date
cuaca['datetime'] = cuaca['datetime'].replace('T',' ')
cuaca['datetime'] = pd.to_datetime(cuaca['datetime'], format='%Y-%m-%d %H:%M:%S')
# membuang fitur dengan missing value
cuaca = cuaca.drop(columns = ['solarenergy', 'preciptype'])
# cek semua nilai unik pada kolom kategorikal
for col in cuaca.select_dtypes(include='object').columns.tolist():
print(f'\nvalue counts of column {col}')
print(cuaca[col].value_counts(normalize=True)*100)
value counts of column conditions Overcast 34.202454 Rain, Overcast 32.157464 Partially cloudy 17.024540 Rain, Partially cloudy 10.429448 Clear 5.265849 Rain 0.920245 Name: conditions, dtype: float64 value counts of column icon rain 43.507157 cloudy 33.358896 partly-cloudy-night 8.767894 partly-cloudy-day 8.128834 clear-day 2.837423 clear-night 2.428425 fog 0.971370 Name: icon, dtype: float64
Cukup banyak unique value pada kolom kategorikal. Nanti di akhir akan dibuang.
b401.head(3)
| date | height | |
|---|---|---|
| 0 | 2022-06-03 00:00:00 | 24.20 |
| 1 | 2022-06-03 00:10:00 | 24.20 |
| 2 | 2022-06-03 00:20:00 | 24.88 |
cuaca.head(3)
| datetime | temp | feelslike | dew | humidity | precip | precipprob | windgust | windspeed | winddir | sealevelpressure | cloudcover | visibility | solarradiation | uvindex | severerisk | conditions | icon | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2022-06-03 00:00:00 | 18.7 | 18.7 | 17.6 | 93.32 | 0.0 | 0 | 3.6 | 4.7 | 68.5 | 1014 | 100.0 | 24.1 | 0 | 0 | 10 | Overcast | cloudy |
| 1 | 2022-06-03 01:00:00 | 18.8 | 18.8 | 17.5 | 92.16 | 0.0 | 0 | 4.0 | 5.4 | 73.8 | 1013 | 100.0 | 24.1 | 0 | 0 | 10 | Overcast | cloudy |
| 2 | 2022-06-03 02:00:00 | 18.9 | 18.9 | 17.4 | 91.01 | 0.0 | 0 | 4.0 | 5.4 | 73.9 | 1012 | 100.0 | 24.1 | 0 | 0 | 10 | Overcast | cloudy |
print('Bentuk data banjir : ', b401.shape)
print('Bentuk data cuaca : ', cuaca.shape)
Bentuk data banjir : (23472, 2) Bentuk data cuaca : (3912, 18)
Karena data cuaca tersedia tiap jam, maka kita akan membuat kolom 'index join'yang berisi tanggal dan jam untuk melakukan penggabungan kedua dataset menggunakan left join
# membuat kolom 'index_join'
b401['index_join'] = b401['date'].astype(str).str[:-6]
cuaca['index_join'] = cuaca['datetime'].astype(str).str[:-6]
# menggabungkan data banjir dan data cuaca dengan left join
merged_data = b401.merge(cuaca, on='index_join', how='left')
# membuang kolom
merged_data = merged_data.drop(columns = ['datetime', 'index_join'])
merged_data.head(3)
| date | height | temp | feelslike | dew | humidity | precip | precipprob | windgust | windspeed | winddir | sealevelpressure | cloudcover | visibility | solarradiation | uvindex | severerisk | conditions | icon | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2022-06-03 00:00:00 | 24.20 | 18.7 | 18.7 | 17.6 | 93.32 | 0.0 | 0 | 3.6 | 4.7 | 68.5 | 1014 | 100.0 | 24.1 | 0 | 0 | 10 | Overcast | cloudy |
| 1 | 2022-06-03 00:10:00 | 24.20 | 18.7 | 18.7 | 17.6 | 93.32 | 0.0 | 0 | 3.6 | 4.7 | 68.5 | 1014 | 100.0 | 24.1 | 0 | 0 | 10 | Overcast | cloudy |
| 2 | 2022-06-03 00:20:00 | 24.88 | 18.7 | 18.7 | 17.6 | 93.32 | 0.0 | 0 | 3.6 | 4.7 | 68.5 | 1014 | 100.0 | 24.1 | 0 | 0 | 10 | Overcast | cloudy |
merged_data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 23472 entries, 0 to 23471 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 date 23472 non-null datetime64[ns] 1 height 15096 non-null float64 2 temp 23472 non-null float64 3 feelslike 23472 non-null float64 4 dew 23472 non-null float64 5 humidity 23472 non-null float64 6 precip 23472 non-null float64 7 precipprob 23472 non-null int64 8 windgust 23472 non-null float64 9 windspeed 23472 non-null float64 10 winddir 23472 non-null float64 11 sealevelpressure 23472 non-null int64 12 cloudcover 23472 non-null float64 13 visibility 23472 non-null float64 14 solarradiation 23472 non-null int64 15 uvindex 23472 non-null int64 16 severerisk 23472 non-null int64 17 conditions 23472 non-null object 18 icon 23472 non-null object dtypes: datetime64[ns](1), float64(11), int64(5), object(2) memory usage: 3.6+ MB
Missing value akan diisi dengan data imputation menggunakan model regresi, model regresi yang akan dipakai:
Mengekstrak fitur datetime menjadi fitur baru untuk membantu model dalam menangkap informasi waktu
# mengekstrak datetime
merged_data['minute'] = merged_data['date'].dt.minute
merged_data['hour'] = merged_data['date'].dt.hour
merged_data['day'] = merged_data['date'].dt.day
merged_data['month'] = merged_data['date'].dt.month
Memisahkan data untuk training model regresi dan prediksi:
# data untuk training model (data tanpa missing value)
merged_mod = merged_data[~merged_data['height'].isna()].reset_index(drop=True)
# data untuk prediksi (hanya data dengan missing value)
merged_pred = merged_data[merged_data['height'].isna()].reset_index(drop=True)
Menggunakan Pearson Correlation dengan fungsi .corr() untuk melihat korelasi antar fitur.
fitur yang berkorelasi tinggi dengan 'height' dan tidak berkorelasi tinggi dengan kolom lainnya akan dipilih sebagai prediktor
# cek korelasi
plt.figure(figsize=(15,8))
sns.heatmap(merged_mod.corr(numeric_only=True), annot=True, cmap='coolwarm', fmt='.2f')
<AxesSubplot: >
Kesimpulan: fitur dengan korelasi >= 0.7 (redundan) :
fitur tersebut tidak akan digunakan untuk training model regresi
# memilih fitur
merged_mod = merged_mod[['dew', 'humidity', 'precip','precipprob', 'windgust', 'windspeed', 'sealevelpressure',
'cloudcover', 'visibility', 'severerisk','minute', 'hour', 'day', 'month','height']]
# split
X = merged_mod.drop('height', axis=1)
y = merged_mod['height']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
#normalisasi
scaler_X = MinMaxScaler().fit(X_train) # fit hanya pada data train agar tidak terjadi data leakage/kebocoran data
X_train = scaler_X.transform(X_train)
X_test = scaler_X.transform(X_test)
# MODEL
# linear regression
lr = LinearRegression()
lr.fit(X_train, y_train)
# random forest regressor
rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)
# support vector regressor
svr = SVR()
svr.fit(X_train, y_train)
# EVAL
# dataframe untuk evaluasi model
eval = pd.DataFrame(index=['train_mae', 'test_mae'],
columns=['Linear Regression', 'Random Forest Regressor', 'SVR'])
# MAE pada data train
eval.loc['train_mae','Linear Regression'] = mean_absolute_error(y_train, lr.predict(X_train))
eval.loc['train_mae','Random Forest Regressor'] = mean_absolute_error(y_train, rfr.predict(X_train))
eval.loc['train_mae','SVR'] = mean_absolute_error(y_train, svr.predict(X_train))
# MAE pada data test
eval.loc['test_mae','Linear Regression'] = mean_absolute_error(y_test, lr.predict(X_test))
eval.loc['test_mae','Random Forest Regressor'] = mean_absolute_error(y_test, rfr.predict(X_test))
eval.loc['test_mae','SVR'] = mean_absolute_error(y_test, svr.predict(X_test))
# hasil evaluasi
eval.transpose()
| train_mae | test_mae | |
|---|---|---|
| Linear Regression | 19.851786 | 19.603865 |
| Random Forest Regressor | 1.169776 | 2.988519 |
| SVR | 15.274559 | 15.109103 |
Model regresi terbaik untuk memprediksi 'height' yaitu Random Forest Regressor
# siapkan data
X_pred = merged_pred[['dew', 'humidity', 'precip',
'precipprob', 'windgust', 'windspeed', 'sealevelpressure',
'cloudcover', 'visibility', 'severerisk',
'minute', 'hour', 'day', 'month']]
# scaling
X_pred = scaler_X.transform(X_pred)
# prediksi
y_pred = rfr.predict(X_pred)
# Menampilkan dalam tabel
y_pred = pd.DataFrame(y_pred)
y_pred.columns = ['height_pred']
y_pred
| height_pred | |
|---|---|
| 0 | 29.1101 |
| 1 | 57.9764 |
| 2 | 54.9229 |
| 3 | 54.8268 |
| 4 | 54.3310 |
| ... | ... |
| 8371 | 99.5001 |
| 8372 | 101.2404 |
| 8373 | 103.9473 |
| 8374 | 104.9046 |
| 8375 | 103.8737 |
8376 rows × 1 columns
# menambahkan kolom date ke y_pred
date = merged_pred[['date']]
y_pred = pd.concat([date, y_pred.reindex(date.index)], axis=1)
# dataset final
dataset_final = merged_data.copy()
dataset_final['height_0'] = dataset_final['height'] # untuk plot data
dataset_final = dataset_final.merge(y_pred, on='date', how='left') # menambahkan y_pred
dataset_final['height'] = dataset_final['height'].fillna(dataset_final['height_pred']) # mengisi NaN dengan y_pred
# bentuk sebaran data
plt.figure(figsize=(15, 5))
plt.plot(dataset_final['height_0'], '.', label='height')
plt.plot(dataset_final['height_pred'], '.', label='height pred')
plt.xlabel('Index', fontsize= 12)
plt.ylabel('Height (cm)', fontsize= 12)
plt.legend()
<matplotlib.legend.Legend at 0x241401cea90>
# bentuk sebaran data
plt.figure(figsize=(15, 5))
plt.plot(dataset_final['height_0'][5000:6000], '.', label='height')
plt.plot(dataset_final['height_pred'][5000:6000], '.', label='height pred')
plt.xlabel('Index', fontsize= 12)
plt.ylabel('Height (cm)', fontsize= 12)
plt.legend()
<matplotlib.legend.Legend at 0x241381234f0>
# bentuk sebaran data
plt.figure(figsize=(15, 5))
plt.plot(dataset_final['height'], '-')
plt.xlabel('Index', fontsize= 12)
plt.ylabel('Height (cm)', fontsize= 12)
Text(0, 0.5, 'Height (cm)')
dataset_final['height'].isna().sum()
0
Data sudah tidak ada missing value, namun agar data benar-benar bersih, dilakukan pengecekan lagi untuk memastikan data bersih
# # membuang fitur yang tidak perlu
# dataset_final = dataset_final.drop(columns = ['conditions', 'icon', 'minute', 'hour', 'day', 'month', 'height_0', 'height_pred'])
# #save dataset
# save_data = dataset_final.copy()
# save_data.to_csv('dataset/dataset_saved/save_databanjir.csv')
# # read dataset
# dataset_final_cleaned = pd.read_csv('dataset/dataset_saved/save_databanjir.csv')
# # interpolasi
# dataset_final_cleaned['height']=dataset_final_cleaned['height'].round(2)
# dataset_final_cleaned['height']= dataset_final_cleaned['height'].fillna(dataset_final_cleaned['height'].interpolate()).round(2)
# dataset_final_cleaned['height'].isna().sum()
Data sudah bersih dari anomali dan missing value
# # save dataset
# dataset_final_cleaned.to_csv('dataset/dataset_saved/dataset_final_cleaned.csv')
Karena data fluktuatif, maka dilakukan smoothing agar memudahkan model prediksi untuk menangkap pola-pola dalam data
# read dataset
read_data = pd.read_csv('dataset/dataset_saved/dataset_final_cleaned.csv')
# smoothing dengan metode Savitzky-Golay
s1 = read_data['height'].values
height_savgol = savgol_filter(s1,
window_length=15,
polyorder=1)
'''
Semakin besar window length = semakin smooth
Semakin besar polyorder = kurang smooth
'''
# menyimpan hasil smoothing
read_data['height'] = height_savgol
# menampilkan sample data 2 hari
plt.figure(figsize=(15, 5))
plt.plot(s1[500:788], '.-')
plt.plot(height_savgol[500:788], 'r')
plt.ylabel('Height (cm)', fontsize= 12)
plt.show()
# bentuk data
plt.figure(figsize=(15, 5))
plt.plot(s1[10000:13000], '.-')
plt.plot(height_savgol[10000:13000], 'r')
plt.ylabel('Height (cm)', fontsize= 12)
plt.show()
# # save dataset
# read_data.to_csv('dataset/dataset_saved/dataset_final_smooth15.csv')
Karena perbandingan data musim kemarau dan data musim hujan tidak seimbang, maka akan dipilih data musim kemarau saja
# read dataset
read_data = pd.read_csv('dataset/dataset_saved/dataset_final_smooth15.csv')
# plot data
plt.figure(figsize=(15, 5))
plt.plot(read_data.height[:17200], '-')
plt.plot(read_data.height[17200:], '-')
plt.xlabel('Index', fontsize= 12)
plt.ylabel('Height (cm)', fontsize= 12)
plt.axvline(17200, color='black', linestyle='--')
plt.legend(['Data musim kemarau', 'Data musim hujan'])
<matplotlib.legend.Legend at 0x25838fb2d60>
# select data
read_data = read_data[:17200]
read_data
| Unnamed: 0 | date | height | temp | feelslike | dew | humidity | precip | precipprob | windgust | windspeed | winddir | sealevelpressure | cloudcover | visibility | solarradiation | uvindex | severerisk | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 03/06/2022 00:00 | 23.522750 | 18.7 | 18.7 | 17.6 | 93.32 | 0.0 | 0 | 3.6 | 4.7 | 68.5 | 1014 | 100.0 | 24.1 | 0 | 0 | 10 |
| 1 | 1 | 03/06/2022 00:10 | 24.445214 | 18.7 | 18.7 | 17.6 | 93.32 | 0.0 | 0 | 3.6 | 4.7 | 68.5 | 1014 | 100.0 | 24.1 | 0 | 0 | 10 |
| 2 | 2 | 03/06/2022 00:20 | 25.367679 | 18.7 | 18.7 | 17.6 | 93.32 | 0.0 | 0 | 3.6 | 4.7 | 68.5 | 1014 | 100.0 | 24.1 | 0 | 0 | 10 |
| 3 | 3 | 03/06/2022 00:30 | 26.290143 | 18.7 | 18.7 | 17.6 | 93.32 | 0.0 | 0 | 3.6 | 4.7 | 68.5 | 1014 | 100.0 | 24.1 | 0 | 0 | 10 |
| 4 | 4 | 03/06/2022 00:40 | 27.212607 | 18.7 | 18.7 | 17.6 | 93.32 | 0.0 | 0 | 3.6 | 4.7 | 68.5 | 1014 | 100.0 | 24.1 | 0 | 0 | 10 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 17195 | 17195 | 30/09/2022 09:50 | 66.340000 | 25.4 | 25.4 | 19.4 | 69.42 | 0.0 | 0 | 6.5 | 3.6 | 314.2 | 1012 | 86.5 | 24.1 | 545 | 5 | 10 |
| 17196 | 17196 | 30/09/2022 10:00 | 66.380667 | 26.8 | 28.1 | 19.6 | 64.70 | 0.0 | 0 | 8.6 | 5.4 | 292.1 | 1011 | 100.0 | 24.1 | 756 | 8 | 10 |
| 17197 | 17197 | 30/09/2022 10:10 | 66.713333 | 26.8 | 28.1 | 19.6 | 64.70 | 0.0 | 0 | 8.6 | 5.4 | 292.1 | 1011 | 100.0 | 24.1 | 756 | 8 | 10 |
| 17198 | 17198 | 30/09/2022 10:20 | 66.912000 | 26.8 | 28.1 | 19.6 | 64.70 | 0.0 | 0 | 8.6 | 5.4 | 292.1 | 1011 | 100.0 | 24.1 | 756 | 8 | 10 |
| 17199 | 17199 | 30/09/2022 10:30 | 66.708667 | 26.8 | 28.1 | 19.6 | 64.70 | 0.0 | 0 | 8.6 | 5.4 | 292.1 | 1011 | 100.0 | 24.1 | 756 | 8 | 10 |
17200 rows × 18 columns
# plot data
plt.figure(figsize=(15, 5))
plt.plot(read_data.height[:14000], '-')
plt.plot(read_data.height[14000:], '-')
plt.xlabel('Index', fontsize= 12)
plt.ylabel('Height (cm)', fontsize= 12)
plt.axvline(14000, color='black', linestyle='--')
plt.legend(['Data modeling', 'Data simulasi'])
<matplotlib.legend.Legend at 0x25838846d30>
# jumlah data
print('Total jumlah data modeling:', len(read_data[:14000]))
print('Total jumlah data simulasi:', len(read_data[14000:]))
Total jumlah data modeling: 14000 Total jumlah data simulasi: 3200
# # SPLIT
# # save data modeling
# data_modeling_banjir = read_data[:14000]
# data_modeling_banjir.to_csv('dataset/data_modeling_banjir.csv')
# # save data simulasi
# data_simulasi_banjir = read_data[14000:]
# data_simulasi_banjir.to_csv('dataset/data_simulasi_banjir.csv')
# import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
from scipy.signal import savgol_filter
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import os
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, GRU
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.metrics import mean_absolute_error, mean_squared_error
# setting
# display max columns
pd.set_option('display.max_columns', None)
# read dataset modeling
df = pd.read_csv('dataset/data_modeling_banjir.csv')
# df = df.drop(columns = ['Unnamed: 0'])
# read dataset lengkap (untuk meihat kejadian-kejadian banjir yang tercatat/diberitakan di media)
df_all = pd.read_csv('dataset/dataset_saved/dataset_final_smooth15.csv')
df_all = df_all.drop(columns = ['Unnamed: 0'])
Kejadian banjir:
# menyesuaikan format date
df_all['date'] = df_all['date'] + ':00'
df_all['date'] = pd.to_datetime(df_all['date'], format='%d/%m/%Y %H:%M:%S')
df_all = df_all.set_index('date')
# menampilkan dataset lengkap
df_all.head(3)
| height | temp | feelslike | dew | humidity | precip | precipprob | windgust | windspeed | winddir | sealevelpressure | cloudcover | visibility | solarradiation | uvindex | severerisk | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| date | ||||||||||||||||
| 2022-06-03 00:00:00 | 23.522750 | 18.7 | 18.7 | 17.6 | 93.32 | 0.0 | 0 | 3.6 | 4.7 | 68.5 | 1014 | 100.0 | 24.1 | 0 | 0 | 10 |
| 2022-06-03 00:10:00 | 24.445214 | 18.7 | 18.7 | 17.6 | 93.32 | 0.0 | 0 | 3.6 | 4.7 | 68.5 | 1014 | 100.0 | 24.1 | 0 | 0 | 10 |
| 2022-06-03 00:20:00 | 25.367679 | 18.7 | 18.7 | 17.6 | 93.32 | 0.0 | 0 | 3.6 | 4.7 | 68.5 | 1014 | 100.0 | 24.1 | 0 | 0 | 10 |
# kejadian banjir 1
df_all['2022-06-11 20:50:00':'2022-06-11 20:50:00']
| height | temp | feelslike | dew | humidity | precip | precipprob | windgust | windspeed | winddir | sealevelpressure | cloudcover | visibility | solarradiation | uvindex | severerisk | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| date | ||||||||||||||||
| 2022-06-11 20:50:00 | 187.087333 | 19.8 | 19.8 | 19.9 | 100.0 | 0.0 | 0 | 13.3 | 4.7 | 278.0 | 1012 | 100.0 | 0.1 | 0 | 0 | 10 |
# banjir 1
plt.figure(figsize=(10, 3))
plt.plot(df_all['height']['2022-06-11 12:00:00':'2022-06-11 23:00:00'], '.-')
plt.plot(df_all['height']['2022-06-11 21:00:00':'2022-06-11 21:00:00'], 'ro')
plt.xlabel('Datetime', fontsize= 12)
plt.ylabel('Height (cm)', fontsize= 12)
plt.legend(['Height', 'Height saat banjir 1'],loc='lower right')
plt.axhline(y=150, color='black', linestyle='--')
plt.show()
# kejadian banjir 2
df_all['2022-10-02 03:00:00':'2022-10-02 03:00:00']
| height | temp | feelslike | dew | humidity | precip | precipprob | windgust | windspeed | winddir | sealevelpressure | cloudcover | visibility | solarradiation | uvindex | severerisk | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| date | ||||||||||||||||
| 2022-10-02 03:00:00 | 206.006 | 19.9 | 19.9 | 19.5 | 97.55 | 0.6 | 100 | 1.1 | 1.1 | 138.2 | 1010 | 47.7 | 17.3 | 0 | 0 | 10 |
# banjir 2
plt.figure(figsize=(10, 3))
plt.plot(df_all['height']['2022-10-01 18:00:00':'2022-10-02 06:00:00'], '.-')
plt.plot(df_all['height']['2022-10-02 03:00:00':'2022-10-02 03:00:00'], 'ro')
plt.xlabel('Datetime', fontsize= 12)
plt.ylabel('Height (cm)', fontsize= 12)
plt.legend(['Height', 'Height saat banjir 2'],loc='lower right')
plt.axhline(y=150, color='black', linestyle='--')
plt.show()
# kejadian banjir 3
df_all['2022-11-11 18:30:00':'2022-11-11 18:30:00']
| height | temp | feelslike | dew | humidity | precip | precipprob | windgust | windspeed | winddir | sealevelpressure | cloudcover | visibility | solarradiation | uvindex | severerisk | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| date | ||||||||||||||||
| 2022-11-11 18:30:00 | 182.850667 | 20.7 | 20.7 | 20.4 | 98.17 | 0.5 | 100 | 4.7 | 2.5 | 205.3 | 1011 | 91.7 | 3.7 | 22 | 0 | 10 |
# banjir 2
plt.figure(figsize=(10, 3))
plt.plot(df_all['height']['2022-11-11 12:00:00':'2022-11-11 23:00:00'], '.-')
plt.plot(df_all['height']['2022-11-11 18:30:00':'2022-11-11 18:30:00'], 'ro')
plt.xlabel('Datetime', fontsize= 12)
plt.ylabel('Height (cm)', fontsize= 12)
plt.legend(['Height', 'Height saat banjir 3'],loc='lower right')
plt.axhline(y=150, color='black', linestyle='--')
plt.show()
Dari ketiga kejadian banjir, banjir terjadi saat tinggi air > 150 cm. Jadi, label akan dibagi menjadi 3 tingkatan:
# define aman=0, siaga 1=1, siaga 2=2
df['status'] = np.where(df['height'] <= 100, 0,
np.where(df['height'] <= 150, 1,
2))
df[['date','height','status']].sample(5)
| date | height | status | |
|---|---|---|---|
| 8263 | 30/07/2022 09:10 | 113.526667 | 1 |
| 660 | 07/06/2022 14:00 | 29.861333 | 0 |
| 327 | 05/06/2022 06:30 | 42.782667 | 0 |
| 13257 | 03/09/2022 01:30 | 173.955333 | 2 |
| 9646 | 08/08/2022 23:40 | 39.408667 | 0 |
df['status'].value_counts()
0 13427 1 433 2 140 Name: status, dtype: int64
# Membagi fitur/prediktor dan label
X = df[['height']]
y = df[['status']]
# Split dataset menjadi train, val, dan test
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X_train, y_train, test_size=0.1, random_state=42)
# scaling/normalisasi data
'''
fit hanya pada data train
agar tidak terjadi data leakage/kebocoran data
'''
scaler_X = MinMaxScaler().fit(X_train)
X_train = scaler_X.transform(X_train)
X_val = scaler_X.transform(X_val)
X_test = scaler_X.transform(X_test)
X_train.shape, y_train.shape, X_val.shape, y_val.shape, X_test.shape, y_test.shape
((10080, 1), (10080, 1), (2800, 1), (2800, 1), (1120, 1), (1120, 1))
import joblib
joblib.dump(scaler_X, 'scaler/scaler_X_klasifikasi.save')
['scaler/scaler_X_klasifikasi.save']
# Membuat model GRU
model_klasifikasi = Sequential()
model_klasifikasi.add(GRU(32, input_shape=(None, 1)))
model_klasifikasi.add(Dense(3, activation='softmax'))
model_klasifikasi.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
gru (GRU) (None, 32) 3360
dense (Dense) (None, 3) 99
=================================================================
Total params: 3,459
Trainable params: 3,459
Non-trainable params: 0
_________________________________________________________________
model_klasifikasi.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
history_klasifikasi = model_klasifikasi.fit(X_train, y_train,
validation_data=(X_val, y_val),
epochs=20, batch_size=32)
Epoch 1/20 315/315 [==============================] - 5s 8ms/step - loss: 0.4130 - accuracy: 0.9561 - val_loss: 0.1969 - val_accuracy: 0.9600 Epoch 2/20 315/315 [==============================] - 2s 6ms/step - loss: 0.1693 - accuracy: 0.9586 - val_loss: 0.1358 - val_accuracy: 0.9600 Epoch 3/20 315/315 [==============================] - 2s 6ms/step - loss: 0.1118 - accuracy: 0.9587 - val_loss: 0.0870 - val_accuracy: 0.9600 Epoch 4/20 315/315 [==============================] - 2s 6ms/step - loss: 0.0744 - accuracy: 0.9644 - val_loss: 0.0629 - val_accuracy: 0.9675 Epoch 5/20 315/315 [==============================] - 2s 6ms/step - loss: 0.0555 - accuracy: 0.9724 - val_loss: 0.0495 - val_accuracy: 0.9746 Epoch 6/20 315/315 [==============================] - 2s 6ms/step - loss: 0.0446 - accuracy: 0.9809 - val_loss: 0.0411 - val_accuracy: 0.9857 Epoch 7/20 315/315 [==============================] - 2s 7ms/step - loss: 0.0367 - accuracy: 0.9878 - val_loss: 0.0340 - val_accuracy: 0.9846 Epoch 8/20 315/315 [==============================] - 2s 7ms/step - loss: 0.0310 - accuracy: 0.9909 - val_loss: 0.0292 - val_accuracy: 0.9889 Epoch 9/20 315/315 [==============================] - 2s 7ms/step - loss: 0.0265 - accuracy: 0.9925 - val_loss: 0.0252 - val_accuracy: 0.9946 Epoch 10/20 315/315 [==============================] - 2s 7ms/step - loss: 0.0231 - accuracy: 0.9941 - val_loss: 0.0223 - val_accuracy: 0.9946 Epoch 11/20 315/315 [==============================] - 2s 7ms/step - loss: 0.0205 - accuracy: 0.9955 - val_loss: 0.0192 - val_accuracy: 0.9964 Epoch 12/20 315/315 [==============================] - 2s 7ms/step - loss: 0.0183 - accuracy: 0.9962 - val_loss: 0.0173 - val_accuracy: 0.9964 Epoch 13/20 315/315 [==============================] - 2s 7ms/step - loss: 0.0164 - accuracy: 0.9969 - val_loss: 0.0170 - val_accuracy: 0.9925 Epoch 14/20 315/315 [==============================] - 3s 8ms/step - loss: 0.0150 - accuracy: 0.9971 - val_loss: 0.0145 - val_accuracy: 0.9961 Epoch 15/20 315/315 [==============================] - 2s 8ms/step - loss: 0.0139 - accuracy: 0.9974 - val_loss: 0.0139 - val_accuracy: 0.9946 Epoch 16/20 315/315 [==============================] - 2s 7ms/step - loss: 0.0129 - accuracy: 0.9974 - val_loss: 0.0124 - val_accuracy: 0.9968 Epoch 17/20 315/315 [==============================] - 3s 8ms/step - loss: 0.0118 - accuracy: 0.9979 - val_loss: 0.0124 - val_accuracy: 0.9954 Epoch 18/20 315/315 [==============================] - 2s 7ms/step - loss: 0.0112 - accuracy: 0.9983 - val_loss: 0.0109 - val_accuracy: 0.9986 Epoch 19/20 315/315 [==============================] - 2s 7ms/step - loss: 0.0104 - accuracy: 0.9982 - val_loss: 0.0102 - val_accuracy: 0.9996 Epoch 20/20 315/315 [==============================] - 2s 7ms/step - loss: 0.0099 - accuracy: 0.9983 - val_loss: 0.0094 - val_accuracy: 0.9993
# plot loss & accuracy
plt.figure(figsize=(12, 4))
plt.subplot(121) # row , col , index
plt.plot(history_klasifikasi.history['loss'])
plt.plot(history_klasifikasi.history['val_loss'])
plt.ylabel('Loss')
plt.xlabel('epoch')
plt.legend(['Loss', 'Validation loss'], loc='upper right')
plt.subplot(122)
plt.plot(history_klasifikasi.history['accuracy'])
plt.plot(history_klasifikasi.history['val_accuracy'])
plt.ylabel('Accuracy')
plt.xlabel('epoch')
plt.legend(['Train accuracy', 'Validation accuracy'], loc='lower right')
plt.show()
# Evaluasi model
train_scores = model_klasifikasi.evaluate(X_train, y_train, verbose=0)
val_scores = model_klasifikasi.evaluate(X_val, y_val, verbose=0)
test_scores = model_klasifikasi.evaluate(X_test, y_test, verbose=0)
# dataframe untuk evaluasi model
df_eval = pd.DataFrame(index=['train', 'val', 'test'],
columns=['loss', 'accuracy'])
df_eval.loc['train', 'loss']=train_scores[0]
df_eval.loc['train', 'accuracy']=train_scores[1]
df_eval.loc['val', 'loss']=val_scores[0]
df_eval.loc['val', 'accuracy']=val_scores[1]
df_eval.loc['test', 'loss']=test_scores[0]
df_eval.loc['test', 'accuracy']=test_scores[1]
df_eval
| loss | accuracy | |
|---|---|---|
| train | 0.009286 | 0.999008 |
| val | 0.009436 | 0.999286 |
| test | 0.008688 | 0.999107 |
#sample_test = X_test.reshape(-1)
sample_test = pd.DataFrame(X_test, columns = ['height_scaled'])
sample_test = sample_test.sample(1).reset_index(drop=True)
# inverse transform
sample_test_inverse = scaler_X.inverse_transform(sample_test[['height_scaled']])
sample_test_inverse = pd.DataFrame(sample_test_inverse,columns = ['height'])
sample_test=sample_test.join(sample_test_inverse)
sample_test
| height_scaled | height | |
|---|---|---|
| 0 | 0.287381 | 74.34 |
pred = model_klasifikasi.predict(sample_test[['height_scaled']])
pred = np.argmax(pred, axis=1)
pred
1/1 [==============================] - 0s 24ms/step
array([0], dtype=int64)
pred = pd.DataFrame(pred,columns = ['status_pred'])
prediksi = sample_test.join(pred)
prediksi
| height_scaled | height | status_pred | |
|---|---|---|---|
| 0 | 0.287381 | 74.34 | 0 |
#sample_test = X_test.reshape(-1)
sample_test = pd.DataFrame(X_test, columns = ['height_scaled'])
sample_test = sample_test.sample(6).reset_index(drop=True)
# inverse transform
sample_test_inverse = scaler_X.inverse_transform(sample_test[['height_scaled']])
sample_test_inverse = pd.DataFrame(sample_test_inverse,columns = ['height'])
sample_test=sample_test.join(sample_test_inverse)
sample_test
| height_scaled | height | |
|---|---|---|
| 0 | 0.327763 | 81.698000 |
| 1 | 0.908770 | 187.561333 |
| 2 | 0.296122 | 75.932667 |
| 3 | 0.063020 | 33.460000 |
| 4 | 0.361231 | 87.796000 |
| 5 | 0.147887 | 48.923333 |
pred = model_klasifikasi.predict(sample_test[['height_scaled']])
pred = np.argmax(pred, axis=1)
pred
1/1 [==============================] - 0s 26ms/step
array([0, 2, 0, 0, 0, 0], dtype=int64)
pred = pd.DataFrame(pred,columns = ['status_pred'])
prediksi = sample_test.join(pred)
prediksi
| height_scaled | height | status_pred | |
|---|---|---|---|
| 0 | 0.327763 | 81.698000 | 0 |
| 1 | 0.908770 | 187.561333 | 2 |
| 2 | 0.296122 | 75.932667 | 0 |
| 3 | 0.063020 | 33.460000 | 0 |
| 4 | 0.361231 | 87.796000 | 0 |
| 5 | 0.147887 | 48.923333 | 0 |
# save model dalam format .h5
model_klasifikasi.save('model/model_klasifikasi_banjir.h5')
# # Memuat model dari format HDF5
from tensorflow.keras.models import load_model
loaded_model = load_model('model/model_klasifikasi_banjir.h5')
# # Evaluasi model
# train_scores = loaded_model.evaluate(X_train, y_train, verbose=0)
# val_scores = loaded_model.evaluate(X_val, y_val, verbose=0)
# test_scores = loaded_model.evaluate(X_test, y_test, verbose=0)
# # dataframe untuk evaluasi model
# df_eval = pd.DataFrame(index=['train', 'val', 'test'],
# columns=['loss', 'accuracy'])
# df_eval.loc['train', 'loss']=train_scores[0]
# df_eval.loc['train', 'accuracy']=train_scores[1]
# df_eval.loc['val', 'loss']=val_scores[0]
# df_eval.loc['val', 'accuracy']=val_scores[1]
# df_eval.loc['test', 'loss']=test_scores[0]
# df_eval.loc['test', 'accuracy']=test_scores[1]
# df_eval
| loss | accuracy | |
|---|---|---|
| train | 0.009286 | 0.999008 |
| val | 0.009436 | 0.999286 |
| test | 0.008688 | 0.999107 |
# import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats
from scipy.signal import savgol_filter
from scipy.stats import boxcox
from sklearn.preprocessing import MinMaxScaler, RobustScaler
import os
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, GRU, Dropout, Bidirectional
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.losses import MeanSquaredError
from tensorflow.keras.metrics import RootMeanSquaredError, MeanAbsolutePercentageError
from sklearn.metrics import mean_absolute_error, mean_squared_error
import warnings
import sys
# setting
# display max columns
pd.set_option('display.max_columns', None)
# ignore warnings
warnings.filterwarnings('ignore')
# system info
print('python/system version:', sys.version)
print('tf version:', tf.__version__)
print('gpu num:', len(tf.config.experimental.list_physical_devices('GPU')))
print('cuda:', tf.test.is_built_with_cuda())
python/system version: 3.9.16 (main, Jan 11 2023, 16:16:36) [MSC v.1916 64 bit (AMD64)] tf version: 2.10.0 gpu num: 1 cuda: True
# read dataset
data = pd.read_csv('dataset/data_modeling_banjir.csv')
data
| date | height | temp | feelslike | dew | humidity | precip | precipprob | windgust | windspeed | winddir | sealevelpressure | cloudcover | visibility | solarradiation | uvindex | severerisk | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 03/06/2022 00:00 | 23.522750 | 18.7 | 18.7 | 17.6 | 93.32 | 0.0 | 0 | 3.6 | 4.7 | 68.5 | 1014 | 100.0 | 24.1 | 0 | 0 | 10 |
| 1 | 03/06/2022 00:10 | 24.445214 | 18.7 | 18.7 | 17.6 | 93.32 | 0.0 | 0 | 3.6 | 4.7 | 68.5 | 1014 | 100.0 | 24.1 | 0 | 0 | 10 |
| 2 | 03/06/2022 00:20 | 25.367679 | 18.7 | 18.7 | 17.6 | 93.32 | 0.0 | 0 | 3.6 | 4.7 | 68.5 | 1014 | 100.0 | 24.1 | 0 | 0 | 10 |
| 3 | 03/06/2022 00:30 | 26.290143 | 18.7 | 18.7 | 17.6 | 93.32 | 0.0 | 0 | 3.6 | 4.7 | 68.5 | 1014 | 100.0 | 24.1 | 0 | 0 | 10 |
| 4 | 03/06/2022 00:40 | 27.212607 | 18.7 | 18.7 | 17.6 | 93.32 | 0.0 | 0 | 3.6 | 4.7 | 68.5 | 1014 | 100.0 | 24.1 | 0 | 0 | 10 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 13995 | 08/09/2022 04:30 | 58.891333 | 18.6 | 18.6 | 17.6 | 93.91 | 0.0 | 0 | 4.7 | 6.5 | 84.6 | 1013 | 100.0 | 24.1 | 0 | 0 | 10 |
| 13996 | 08/09/2022 04:40 | 59.565333 | 18.6 | 18.6 | 17.6 | 93.91 | 0.0 | 0 | 4.7 | 6.5 | 84.6 | 1013 | 100.0 | 24.1 | 0 | 0 | 10 |
| 13997 | 08/09/2022 04:50 | 59.687333 | 18.6 | 18.6 | 17.6 | 93.91 | 0.0 | 0 | 4.7 | 6.5 | 84.6 | 1013 | 100.0 | 24.1 | 0 | 0 | 10 |
| 13998 | 08/09/2022 05:00 | 59.669333 | 19.0 | 19.0 | 17.3 | 89.87 | 0.0 | 0 | 4.7 | 6.5 | 81.5 | 1013 | 100.0 | 24.1 | 0 | 0 | 10 |
| 13999 | 08/09/2022 05:10 | 59.892000 | 19.0 | 19.0 | 17.3 | 89.87 | 0.0 | 0 | 4.7 | 6.5 | 81.5 | 1013 | 100.0 | 24.1 | 0 | 0 | 10 |
14000 rows × 17 columns
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 14000 entries, 0 to 13999 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 date 14000 non-null object 1 height 14000 non-null float64 2 temp 14000 non-null float64 3 feelslike 14000 non-null float64 4 dew 14000 non-null float64 5 humidity 14000 non-null float64 6 precip 14000 non-null float64 7 precipprob 14000 non-null int64 8 windgust 14000 non-null float64 9 windspeed 14000 non-null float64 10 winddir 14000 non-null float64 11 sealevelpressure 14000 non-null int64 12 cloudcover 14000 non-null float64 13 visibility 14000 non-null float64 14 solarradiation 14000 non-null int64 15 uvindex 14000 non-null int64 16 severerisk 14000 non-null int64 dtypes: float64(11), int64(5), object(1) memory usage: 1.8+ MB
data.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| height | 14000.0 | 51.292076 | 24.178707 | 21.977333 | 34.573833 | 45.142333 | 59.586167 | 204.42 |
| temp | 14000.0 | 21.766214 | 2.825473 | 17.500000 | 19.600000 | 20.500000 | 24.000000 | 29.80 |
| feelslike | 14000.0 | 21.846400 | 3.001786 | 17.500000 | 19.600000 | 20.500000 | 24.000000 | 31.00 |
| dew | 14000.0 | 19.255143 | 1.506380 | 13.900000 | 18.400000 | 19.300000 | 20.200000 | 23.20 |
| humidity | 14000.0 | 86.922140 | 13.231809 | 45.860000 | 79.640000 | 92.880000 | 96.960000 | 100.00 |
| precip | 14000.0 | 0.460029 | 1.716985 | 0.000000 | 0.000000 | 0.000000 | 0.300000 | 33.60 |
| precipprob | 14000.0 | 40.371429 | 49.065900 | 0.000000 | 0.000000 | 0.000000 | 100.000000 | 100.00 |
| windgust | 14000.0 | 4.871643 | 2.411981 | 0.700000 | 3.200000 | 4.300000 | 6.100000 | 26.30 |
| windspeed | 14000.0 | 4.640386 | 2.019249 | 0.000000 | 3.200000 | 4.700000 | 5.800000 | 11.50 |
| winddir | 14000.0 | 156.146300 | 94.399413 | 1.000000 | 77.200000 | 98.200000 | 252.800000 | 359.80 |
| sealevelpressure | 14000.0 | 1011.106571 | 1.553784 | 1007.000000 | 1010.000000 | 1011.000000 | 1012.000000 | 1016.00 |
| cloudcover | 14000.0 | 78.253043 | 30.232102 | 1.200000 | 63.400000 | 96.100000 | 100.000000 | 100.00 |
| visibility | 14000.0 | 19.008100 | 8.036918 | 0.100000 | 13.700000 | 24.100000 | 24.100000 | 24.10 |
| solarradiation | 14000.0 | 205.830857 | 297.017731 | 0.000000 | 0.000000 | 6.000000 | 401.000000 | 987.00 |
| uvindex | 14000.0 | 2.046000 | 2.981074 | 0.000000 | 0.000000 | 0.000000 | 4.000000 | 10.00 |
| severerisk | 14000.0 | 13.599143 | 10.788442 | 3.000000 | 10.000000 | 10.000000 | 10.000000 | 75.00 |
Memilih fitur yang tidak redundan, korelasi antar fitur < 0.5
# cek korelasi
plt.figure(figsize=(15,8))
sns.heatmap(data.corr(), annot=True, cmap='coolwarm', fmt='.2f')
#plt.savefig('corr.png')
<AxesSubplot: >
Nilai positif menunjukkan hubungan yang positif, misal hubungan precip dan height, semakin besar nilai precip maka semakin besar juga nilai height. Nilai negatif menunjukkan hubungan yang negatif, misalnya hubungan windspeed dengan height.
Karena terdapat banyak fitur, kita akan memilih beberapa fitur untuk dilakukan eksplorasi lebih dalam. Pertama, memastikan fitur yang dipilih tidak redundan/memiliki kesamaan informasi dengan fitur lainnya(digunakan treshold 0.5)
Fitur redundan:
Lalu, akan mengekstrak selain fitur di atas untuk melihat apakah ada fitur baru yang memiliki korelasi tinggi dengan 'height'
# menyesuaikan format date
data['date'] = data['date'] + ':00'
data['date'] = pd.to_datetime(data['date'], format='%d/%m/%Y %H:%M:%S')
''' LAG '''
# NILAI SEBELUMNYA (LAG FEATURE) 3H
data['height_3h'] = data['height'].shift(18)
data['dew_3h'] = data['dew'].shift(18)
data['humidity_3h'] = data['humidity'].shift(18)
data['precip_3h'] = data['precip'].shift(18)
data['windgust_3h'] = data['windgust'].shift(18)
data['sealevelpressure_3h'] = data['sealevelpressure'].shift(18)
data['cloudcover_3h'] = data['cloudcover'].shift(18)
data['visibility_3h'] = data['visibility'].shift(18)
# NILAI SEBELUMNYA (LAG FEATURE) 6H
data['height_6h'] = data['height'].shift(36) #36=back, -36=forward
data['dew_6h'] = data['dew'].shift(36)
data['humidity_6h'] = data['humidity'].shift(36)
data['precip_6h'] = data['precip'].shift(36)
data['windgust_6h'] = data['windgust'].shift(36)
data['sealevelpressure_6h'] = data['sealevelpressure'].shift(36)
data['cloudcover_6h'] = data['cloudcover'].shift(36)
data['visibility_6h'] = data['visibility'].shift(36)
# NILAI SEBELUMNYA (LAG FEATURE) 12H
data['height_12h'] = data['height'].shift(72)
data['dew_12h'] = data['dew'].shift(72)
data['humidity_12h'] = data['humidity'].shift(72)
data['precip_12h'] = data['precip'].shift(72)
data['windgust_12h'] = data['windgust'].shift(72)
data['sealevelpressure_12h'] = data['sealevelpressure'].shift(72)
data['cloudcover_12h'] = data['cloudcover'].shift(72)
data['visibility_12h'] = data['visibility'].shift(72)
# NILAI SEBELUMNYA (LAG FEATURE) 18H
data['height_18h'] = data['height'].shift(108)
data['dew_18h'] = data['dew'].shift(108)
data['humidity_18h'] = data['humidity'].shift(108)
data['precip_18h'] = data['precip'].shift(108)
data['windgust_18h'] = data['windgust'].shift(108)
data['sealevelpressure_18h'] = data['sealevelpressure'].shift(108)
data['cloudcover_18h'] = data['cloudcover'].shift(108)
data['visibility_18h'] = data['visibility'].shift(108)
''' DIFFERENCE '''
# FITUR BARU: SELISIH DENGAN NILAI 3H SEBELUMNYA
data['height_diff_3h'] = data['height'] - data['height'].shift(18)
data['dew_diff_3h'] = data['dew'] - data['dew'].shift(18)
data['humidity_diff_3h'] = data['humidity'] - data['humidity'].shift(18)
data['precip_diff_3h'] = data['precip'] - data['precip'].shift(18)
data['windgust_diff_3h'] = data['windgust'] - data['windgust'].shift(18)
data['sealevelpressure_diff_3h'] = data['sealevelpressure'] - data['sealevelpressure'].shift(18)
data['cloudcover_diff_3h'] = data['cloudcover'] - data['cloudcover'].shift(18)
data['visibility_diff_3h'] = data['visibility'] - data['visibility'].shift(18)
# FITUR BARU: SELISIH DENGAN NILAI 6H SEBELUMNYA
data['height_diff_6h'] = data['height'] - data['height'].shift(36)
data['dew_diff_6h'] = data['dew'] - data['dew'].shift(36)
data['humidity_diff_6h'] = data['humidity'] - data['humidity'].shift(36)
data['precip_diff_6h'] = data['precip'] - data['precip'].shift(36)
data['windgust_diff_6h'] = data['windgust'] - data['windgust'].shift(36)
data['sealevelpressure_diff_6h'] = data['sealevelpressure'] - data['sealevelpressure'].shift(36)
data['cloudcover_diff_6h'] = data['cloudcover'] - data['cloudcover'].shift(36)
data['visibility_diff_6h'] = data['visibility'] - data['visibility'].shift(36)
# FITUR BARU: SELISIH DENGAN NILAI 12H SEBELUMNYA
data['height_diff_12h'] = data['height'] - data['height'].shift(72)
data['dew_diff_12h'] = data['dew'] - data['dew'].shift(72)
data['humidity_diff_12h'] = data['humidity'] - data['humidity'].shift(72)
data['precip_diff_12h'] = data['precip'] - data['precip'].shift(72)
data['windgust_diff_12h'] = data['windgust'] - data['windgust'].shift(72)
data['sealevelpressure_diff_12h'] = data['sealevelpressure'] - data['sealevelpressure'].shift(72)
data['cloudcover_diff_12h'] = data['cloudcover'] - data['cloudcover'].shift(72)
data['visibility_diff_12h'] = data['visibility'] - data['visibility'].shift(72)
# FITUR BARU: SELISIH DENGAN NILAI 24H SEBELUMNYA
data['height_diff_18h'] = data['height'] - data['height'].shift(108)
data['dew_diff_18h'] = data['dew'] - data['dew'].shift(108)
data['humidity_diff_18h'] = data['humidity'] - data['humidity'].shift(108)
data['precip_diff_18h'] = data['precip'] - data['precip'].shift(108)
data['windgust_diff_18h'] = data['windgust'] - data['windgust'].shift(108)
data['sealevelpressure_diff_18h'] = data['sealevelpressure'] - data['sealevelpressure'].shift(108)
data['cloudcover_diff_18h'] = data['cloudcover'] - data['cloudcover'].shift(108)
data['visibility_diff_18h'] = data['visibility'] - data['visibility'].shift(108)
# merapihkan tabel
data = data.dropna()
data = data.reset_index(drop = True)
data150=data[data['height']>150] # untuk EDA
def plot_eda_grafik(col='col'):
plt.figure(figsize=(15, 3))
plt.plot(data[col][1000:2100], '-')
plt.title("Bentuk Data", fontweight='bold', fontsize= 14)
plt.ylabel(col) #
plt.legend([col], loc='upper right') #
plt.show()
def plot_eda_dist_scatter(col='col'):
plt.figure(figsize=(15, 3))
plt.subplot(121) # row , col , index
sns.distplot(data[col], kde=True)
plt.legend([col], loc='upper right')
plt.title("Distribusi Data", fontweight='bold', fontsize= 14)
plt.subplot(122)
sns.scatterplot(x=data[col], y=data['height'])
plt.axhline(150, color='grey', linestyle='dotted')
plt.title('Scatter Plot', fontweight='bold', fontsize= 14)
plt.show()
def col_info_lag(column):
print('Persentase',column,' :', (data[(data150[column].min()<=data[column]) & (data[column]<=data150[column].max())].shape[0]/len(data)*100), '%')
print('Persentase lag 3h :', (data[(data150[column+'_3h'].min()<=data[column+'_3h']) & (data[column+'_3h']<=data150[column+'_3h'].max())].shape[0]/len(data)*100), '%')
print('Persentase lag 6h :', (data[(data150[column+'_6h'].min()<=data[column+'_6h']) & (data[column+'_6h']<=data150[column+'_6h'].max())].shape[0]/len(data)*100), '%')
print('Persentase lag 12h:', (data[((data150[column+'_12h'].min()<=data[column+'_12h']) & (data[column+'_12h']<=data150[column+'_12h'].max()))].shape[0]/len(data)*100), '%')
print('Persentase lag 18h:', (data[((data150[column+'_18h'].min()<=data[column+'_18h']) & (data[column+'_18h']<=data150[column+'_18h'].max()))].shape[0]/len(data)*100), '%')
return data150[[column,column+'_3h', column+'_6h', column+'_12h', column+'_18h']].describe().T
def col_info_diff(column):
print('Persentase diff 3h :', (data[(data150[column+'_diff_3h'].min()<=data[column+'_diff_3h']) & (data[column+'_diff_3h']<=data150[column+'_diff_3h'].max())].shape[0]/len(data)*100), '%')
print('Persentase diff 6h :', (data[(data150[column+'_diff_6h'].min()<=data[column+'_diff_6h']) & (data[column+'_diff_6h']<=data150[column+'_diff_6h'].max())].shape[0]/len(data)*100), '%')
print('Persentase diff 12h:', (data[((data150[column+'_diff_12h'].min()<=data[column+'_diff_12h']) & (data[column+'_diff_12h']<=data150[column+'_diff_12h'].max()))].shape[0]/len(data)*100), '%')
print('Persentase diff 18h:', (data[((data150[column+'_diff_18h'].min()<=data[column+'_diff_18h']) & (data[column+'_diff_18h']<=data150[column+'_diff_18h'].max()))].shape[0]/len(data)*100), '%')
return data150[[column+'_diff_3h', column+'_diff_6h', column+'_diff_12h', column+'_diff_18h']].describe().T
c='height'
plot_eda_grafik(col=c)
plot_eda_dist_scatter(col=c)
plot_eda_dist_scatter(col=c+'_3h')
plot_eda_dist_scatter(col=c+'_6h')
plot_eda_dist_scatter(col=c+'_12h')
plot_eda_dist_scatter(col=c+'_18h')
col_info_lag(c)
Persentase height : 1.0077742585660812 % Persentase lag 3h : 94.4284480276418 % Persentase lag 6h : 99.14339188021883 % Persentase lag 12h: 93.97494961128707 % Persentase lag 18h: 92.63604952490641 %
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| height | 140.0 | 176.832148 | 15.969685 | 150.573333 | 164.107667 | 174.072333 | 192.347667 | 204.420000 |
| height_3h | 140.0 | 128.178524 | 61.322230 | 28.066667 | 68.682500 | 155.238333 | 187.986500 | 204.420000 |
| height_6h | 140.0 | 92.290786 | 60.336501 | 24.878000 | 37.994833 | 80.369333 | 148.633667 | 204.420000 |
| height_12h | 140.0 | 58.491133 | 25.022157 | 24.339333 | 36.927667 | 48.472667 | 82.662167 | 92.665333 |
| height_18h | 140.0 | 63.449195 | 26.289866 | 26.362667 | 41.667500 | 50.155000 | 92.337167 | 95.031333 |
c='height'
plot_eda_dist_scatter(col=c+'_diff_3h')
plot_eda_dist_scatter(col=c+'_diff_6h')
plot_eda_dist_scatter(col=c+'_diff_12h')
plot_eda_dist_scatter(col=c+'_diff_18h')
col_info_diff(c)
Persentase diff 3h : 99.7336596602361 % Persentase diff 6h : 98.76907572703715 % Persentase diff 12h: 2.548229196659948 % Persentase diff 18h: 2.713792110567233 %
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| height_diff_3h | 140.0 | 48.653624 | 64.225509 | -47.192000 | -5.701167 | 33.901667 | 107.496167 | 167.181333 |
| height_diff_6h | 140.0 | 84.541362 | 67.075627 | -41.800000 | 22.633667 | 103.777667 | 142.149667 | 171.178000 |
| height_diff_12h | 140.0 | 118.341014 | 31.182933 | 63.102000 | 92.195333 | 111.611333 | 151.963167 | 170.443333 |
| height_diff_18h | 140.0 | 113.382952 | 32.204895 | 63.519333 | 81.580667 | 111.302667 | 146.252000 | 168.838667 |
c='dew'
plot_eda_grafik(col=c)
plot_eda_dist_scatter(col=c)
plot_eda_dist_scatter(col=c+'_3h')
plot_eda_dist_scatter(col=c+'_6h')
plot_eda_dist_scatter(col=c+'_12h')
plot_eda_dist_scatter(col=c+'_18h')
col_info_lag(c)
Persentase dew : 46.94788367405701 % Persentase lag 3h : 46.08407716671466 % Persentase lag 6h : 74.690469334869 % Persentase lag 12h: 81.86006334581054 % Persentase lag 18h: 66.57068816585084 %
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| dew | 140.0 | 20.201429 | 0.790955 | 19.4 | 19.7 | 19.9 | 20.3 | 22.4 |
| dew_3h | 140.0 | 20.460000 | 0.759477 | 19.4 | 19.9 | 20.2 | 20.8 | 22.2 |
| dew_6h | 140.0 | 20.557857 | 0.907449 | 18.3 | 20.2 | 20.3 | 21.1 | 21.9 |
| dew_12h | 140.0 | 19.415000 | 1.008091 | 17.8 | 18.8 | 19.2 | 20.3 | 21.9 |
| dew_18h | 140.0 | 18.888571 | 0.731912 | 17.8 | 18.3 | 19.0 | 19.2 | 20.5 |
c='dew'
plot_eda_dist_scatter(col=c+'_diff_3h')
plot_eda_dist_scatter(col=c+'_diff_6h')
plot_eda_dist_scatter(col=c+'_diff_12h')
plot_eda_dist_scatter(col=c+'_diff_18h')
col_info_diff(c)
Persentase diff 3h : 81.60092139360783 % Persentase diff 6h : 91.18917362510798 % Persentase diff 12h: 77.06593723006047 % Persentase diff 18h: 34.5090699683271 %
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| dew_diff_3h | 140.0 | -0.258571 | 0.678338 | -1.7 | -0.7 | -0.2 | 0.000 | 1.2 |
| dew_diff_6h | 140.0 | -0.356429 | 1.349831 | -2.4 | -1.5 | -0.4 | 0.000 | 3.2 |
| dew_diff_12h | 140.0 | 0.786429 | 1.503292 | -2.4 | -0.6 | 0.9 | 2.100 | 3.2 |
| dew_diff_18h | 140.0 | 1.312857 | 0.530080 | 0.2 | 0.8 | 1.4 | 1.725 | 2.1 |
c='humidity'
plot_eda_grafik(col=c)
plot_eda_dist_scatter(col=c)
plot_eda_dist_scatter(col=c+'_3h')
plot_eda_dist_scatter(col=c+'_6h')
plot_eda_dist_scatter(col=c+'_12h')
plot_eda_dist_scatter(col=c+'_18h')
col_info_lag(c)
Persentase humidity : 67.78001727613014 % Persentase lag 3h : 80.99625683846818 % Persentase lag 6h : 91.05960264900662 % Persentase lag 12h: 77.49784048373164 % Persentase lag 18h: 35.502447451770806 %
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| humidity | 140.0 | 98.360857 | 3.144444 | 84.83 | 99.3800 | 99.38 | 100.00 | 100.00 |
| humidity_3h | 140.0 | 95.202571 | 8.252113 | 73.73 | 98.1700 | 99.38 | 99.38 | 100.00 |
| humidity_6h | 140.0 | 93.729000 | 9.260984 | 62.54 | 94.0300 | 98.17 | 99.38 | 100.00 |
| humidity_12h | 140.0 | 96.091143 | 3.976052 | 68.37 | 95.7700 | 97.54 | 98.15 | 98.76 |
| humidity_18h | 140.0 | 98.026500 | 0.848243 | 95.70 | 97.9875 | 98.15 | 98.76 | 99.38 |
c='humidity'
plot_eda_dist_scatter(col=c+'_diff_3h')
plot_eda_dist_scatter(col=c+'_diff_6h')
plot_eda_dist_scatter(col=c+'_diff_12h')
plot_eda_dist_scatter(col=c+'_diff_18h')
col_info_diff(c)
Persentase diff 3h : 50.66225165562914 % Persentase diff 6h : 56.88165850849409 % Persentase diff 12h: 64.79988482579903 % Persentase diff 18h: 44.241289951050966 %
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| humidity_diff_3h | 140.0 | 3.158286 | 5.574438 | -0.62 | 0.0000 | 0.61 | 1.23 | 19.69 |
| humidity_diff_6h | 140.0 | 4.631857 | 7.305082 | -3.27 | 0.6175 | 1.78 | 5.34 | 30.93 |
| humidity_diff_12h | 140.0 | 2.269714 | 5.611590 | -13.31 | 1.2300 | 1.85 | 4.23 | 31.63 |
| humidity_diff_18h | 140.0 | 0.334357 | 3.184289 | -13.93 | 0.6200 | 1.24 | 1.85 | 4.30 |
c='precip'
plot_eda_grafik(col=c)
plot_eda_dist_scatter(col=c)
plot_eda_dist_scatter(col=c+'_3h')
plot_eda_dist_scatter(col=c+'_6h')
plot_eda_dist_scatter(col=c+'_12h')
plot_eda_dist_scatter(col=c+'_18h')
col_info_lag(c)
Persentase precip : 100.0 % Persentase lag 3h : 100.0 % Persentase lag 6h : 100.0 % Persentase lag 12h: 99.04981284192341 % Persentase lag 18h: 99.04981284192341 %
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| precip | 140.0 | 8.615714 | 8.257870 | 0.0 | 1.8 | 8.7 | 12.200 | 33.6 |
| precip_3h | 140.0 | 6.651429 | 8.104013 | 0.0 | 1.0 | 2.8 | 10.500 | 33.6 |
| precip_6h | 140.0 | 3.248571 | 7.060524 | 0.0 | 0.0 | 1.3 | 2.800 | 33.6 |
| precip_12h | 140.0 | 1.157143 | 2.046585 | 0.0 | 0.0 | 0.4 | 1.300 | 9.0 |
| precip_18h | 140.0 | 1.390000 | 2.001521 | 0.0 | 0.2 | 0.7 | 1.525 | 9.0 |
c='precip'
plot_eda_dist_scatter(col=c+'_diff_3h')
plot_eda_dist_scatter(col=c+'_diff_6h')
plot_eda_dist_scatter(col=c+'_diff_12h')
plot_eda_dist_scatter(col=c+'_diff_18h')
col_info_diff(c)
Persentase diff 3h : 99.91361934926577 % Persentase diff 6h : 99.95680967463288 % Persentase diff 12h: 98.31557731068241 % Persentase diff 18h: 92.35531241002015 %
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| precip_diff_3h | 140.0 | 1.964286 | 9.315309 | -18.3 | -1.00 | 1.1 | 7.3 | 28.5 |
| precip_diff_6h | 140.0 | 5.367143 | 10.083496 | -23.8 | 0.15 | 5.1 | 9.2 | 33.1 |
| precip_diff_12h | 140.0 | 7.458571 | 8.173232 | -3.9 | 0.00 | 7.1 | 11.9 | 30.3 |
| precip_diff_18h | 140.0 | 7.225714 | 7.859571 | -1.0 | 0.00 | 6.4 | 11.1 | 32.8 |
c='windgust'
plot_eda_grafik(col=c)
plot_eda_dist_scatter(col=c)
plot_eda_dist_scatter(col=c+'_3h')
plot_eda_dist_scatter(col=c+'_6h')
plot_eda_dist_scatter(col=c+'_12h')
plot_eda_dist_scatter(col=c+'_18h')
col_info_lag(c)
Persentase windgust : 25.17995968902966 % Persentase lag 3h : 65.05902677800172 % Persentase lag 6h : 99.61128707169594 % Persentase lag 12h: 98.22919665994817 % Persentase lag 18h: 93.72300604664555 %
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| windgust | 140.0 | 9.039286 | 3.077154 | 6.1 | 6.5 | 8.6 | 10.4 | 17.6 |
| windgust_3h | 140.0 | 7.538571 | 3.127313 | 4.0 | 6.1 | 6.5 | 7.9 | 17.6 |
| windgust_6h | 140.0 | 6.404286 | 3.459631 | 1.1 | 4.3 | 6.1 | 7.6 | 17.6 |
| windgust_12h | 140.0 | 4.767143 | 2.690844 | 1.1 | 3.6 | 4.0 | 4.7 | 11.9 |
| windgust_18h | 140.0 | 5.380714 | 2.503376 | 2.2 | 3.6 | 4.3 | 6.5 | 11.9 |
c='windgust'
plot_eda_dist_scatter(col=c+'_diff_3h')
plot_eda_dist_scatter(col=c+'_diff_6h')
plot_eda_dist_scatter(col=c+'_diff_12h')
plot_eda_dist_scatter(col=c+'_diff_18h')
col_info_diff(c)
Persentase diff 3h : 98.40195796141664 % Persentase diff 6h : 99.265764468759 % Persentase diff 12h: 80.00287935502448 % Persentase diff 18h: 86.69737978692773 %
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| windgust_diff_3h | 140.0 | 1.500714 | 3.989636 | -7.2 | -0.7 | 1.8 | 4.3 | 8.2 |
| windgust_diff_6h | 140.0 | 2.635000 | 4.137874 | -9.0 | 0.4 | 1.8 | 6.1 | 11.1 |
| windgust_diff_12h | 140.0 | 4.272143 | 3.266795 | -2.5 | 2.4 | 3.2 | 6.1 | 11.5 |
| windgust_diff_18h | 140.0 | 3.658571 | 3.995285 | -2.9 | 1.8 | 2.9 | 5.7 | 14.4 |
c='sealevelpressure'
plot_eda_grafik(col=c)
plot_eda_dist_scatter(col=c)
plot_eda_dist_scatter(col=c+'_3h')
plot_eda_dist_scatter(col=c+'_6h')
plot_eda_dist_scatter(col=c+'_12h')
plot_eda_dist_scatter(col=c+'_18h')
col_info_lag(c)
Persentase sealevelpressure : 94.25568672617334 % Persentase lag 3h : 94.25568672617334 % Persentase lag 6h : 94.28448027641808 % Persentase lag 12h: 65.53412035704002 % Persentase lag 18h: 93.59343507054419 %
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| sealevelpressure | 140.0 | 1011.678571 | 1.637105 | 1009.0 | 1010.0 | 1012.0 | 1013.0 | 1015.0 |
| sealevelpressure_3h | 140.0 | 1011.271429 | 1.392905 | 1009.0 | 1011.0 | 1011.0 | 1012.0 | 1015.0 |
| sealevelpressure_6h | 140.0 | 1011.514286 | 0.978074 | 1009.0 | 1011.0 | 1011.0 | 1012.0 | 1015.0 |
| sealevelpressure_12h | 140.0 | 1012.178571 | 0.954112 | 1011.0 | 1011.0 | 1012.0 | 1013.0 | 1014.0 |
| sealevelpressure_18h | 140.0 | 1012.028571 | 1.059000 | 1009.0 | 1011.0 | 1012.0 | 1013.0 | 1014.0 |
c='sealevelpressure'
plot_eda_dist_scatter(col=c+'_diff_3h')
plot_eda_dist_scatter(col=c+'_diff_6h')
plot_eda_dist_scatter(col=c+'_diff_12h')
plot_eda_dist_scatter(col=c+'_diff_18h')
col_info_diff(c)
Persentase diff 3h : 97.23581917650446 % Persentase diff 6h : 94.86035128131299 % Persentase diff 12h: 76.07255974661675 % Persentase diff 18h: 95.50820616181976 %
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| sealevelpressure_diff_3h | 140.0 | 0.407143 | 1.790850 | -2.0 | -1.0 | 0.0 | 2.0 | 4.0 |
| sealevelpressure_diff_6h | 140.0 | 0.164286 | 2.361595 | -3.0 | -2.0 | -1.0 | 2.0 | 4.0 |
| sealevelpressure_diff_12h | 140.0 | -0.500000 | 1.172220 | -3.0 | -1.0 | -1.0 | 1.0 | 1.0 |
| sealevelpressure_diff_18h | 140.0 | -0.350000 | 2.201781 | -4.0 | -2.0 | -1.0 | 1.0 | 4.0 |
c='cloudcover'
plot_eda_grafik(col=c)
plot_eda_dist_scatter(col=c)
plot_eda_dist_scatter(col=c+'_3h')
plot_eda_dist_scatter(col=c+'_6h')
plot_eda_dist_scatter(col=c+'_12h')
plot_eda_dist_scatter(col=c+'_18h')
col_info_lag(c)
Persentase cloudcover : 81.90325367117765 % Persentase lag 3h : 56.809674632882235 % Persentase lag 6h : 81.42816009213935 % Persentase lag 12h: 90.88684134753815 % Persentase lag 18h: 91.92340915634898 %
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| cloudcover | 140.0 | 97.282857 | 11.073021 | 45.5 | 100.0 | 100.0 | 100.0 | 100.0 |
| cloudcover_3h | 140.0 | 98.798571 | 2.457729 | 90.7 | 100.0 | 100.0 | 100.0 | 100.0 |
| cloudcover_6h | 140.0 | 94.325000 | 13.062711 | 46.9 | 96.1 | 100.0 | 100.0 | 100.0 |
| cloudcover_12h | 140.0 | 97.097857 | 9.374851 | 18.9 | 99.7 | 100.0 | 100.0 | 100.0 |
| cloudcover_18h | 140.0 | 86.723571 | 25.241395 | 17.2 | 89.3 | 100.0 | 100.0 | 100.0 |
c='cloudcover'
plot_eda_dist_scatter(col=c+'_diff_3h')
plot_eda_dist_scatter(col=c+'_diff_6h')
plot_eda_dist_scatter(col=c+'_diff_12h')
plot_eda_dist_scatter(col=c+'_diff_18h')
col_info_diff(c)
Persentase diff 3h : 65.40454938093868 % Persentase diff 6h : 84.14915059026778 % Persentase diff 12h: 87.64756694500431 % Persentase diff 18h: 86.91333141376332 %
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| cloudcover_diff_3h | 140.0 | -1.515714 | 9.516100 | -45.2 | 0.0 | 0.0 | 0.0 | 6.8 |
| cloudcover_diff_6h | 140.0 | 2.957857 | 18.008543 | -54.5 | 0.0 | 0.0 | 3.9 | 53.1 |
| cloudcover_diff_12h | 140.0 | 0.185000 | 15.017495 | -54.5 | 0.0 | 0.0 | 0.3 | 81.1 |
| cloudcover_diff_18h | 140.0 | 10.559286 | 28.851460 | -54.5 | 0.0 | 0.0 | 10.7 | 82.8 |
c='visibility'
plot_eda_grafik(col=c)
plot_eda_dist_scatter(col=c)
plot_eda_dist_scatter(col=c+'_3h')
plot_eda_dist_scatter(col=c+'_6h')
plot_eda_dist_scatter(col=c+'_12h')
plot_eda_dist_scatter(col=c+'_18h')
col_info_lag(c)
Persentase visibility : 100.0 % Persentase lag 3h : 100.0 % Persentase lag 6h : 100.0 % Persentase lag 12h: 91.96659948171609 % Persentase lag 18h: 98.01324503311258 %
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| visibility | 140.0 | 2.952143 | 6.193725 | 0.1 | 0.10 | 0.1 | 0.2 | 24.1 |
| visibility_3h | 140.0 | 3.417857 | 6.754932 | 0.1 | 0.10 | 0.2 | 2.6 | 24.1 |
| visibility_6h | 140.0 | 9.255000 | 10.545321 | 0.1 | 0.20 | 2.6 | 23.9 | 24.1 |
| visibility_12h | 140.0 | 16.252143 | 7.881902 | 2.4 | 7.65 | 17.8 | 24.1 | 24.1 |
| visibility_18h | 140.0 | 11.208571 | 6.943383 | 0.2 | 4.60 | 11.6 | 16.1 | 24.1 |
c='visibility'
plot_eda_dist_scatter(col=c+'_diff_3h')
plot_eda_dist_scatter(col=c+'_diff_6h')
plot_eda_dist_scatter(col=c+'_diff_12h')
plot_eda_dist_scatter(col=c+'_diff_18h')
col_info_diff(c)
Persentase diff 3h : 86.48142816009215 % Persentase diff 6h : 75.94298877051541 % Persentase diff 12h: 78.5200115174201 % Persentase diff 18h: 86.74057011229485 %
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| visibility_diff_3h | 140.0 | -0.465714 | 3.488105 | -11.3 | -0.3 | 0.0 | 0.0 | 11.2 |
| visibility_diff_6h | 140.0 | -6.302857 | 8.032891 | -23.7 | -11.3 | -0.4 | -0.1 | 0.1 |
| visibility_diff_12h | 140.0 | -13.300000 | 10.092222 | -24.0 | -24.0 | -15.9 | -5.5 | 7.3 |
| visibility_diff_18h | 140.0 | -8.256429 | 9.286303 | -24.0 | -15.0 | -11.3 | -2.3 | 12.5 |
Kesimpulan:
NUMS=data[['height','dew','humidity','windgust',
'dew_3h','dew_18h','humidity_18h','windgust_3h','sealevelpressure_12h','cloudcover_3h',
'height_diff_12h','height_diff_18h','dew_diff_18h','humidity_diff_3h','humidity_diff_6h','humidity_diff_12h','humidity_diff_18h','cloudcover_diff_3h']]
plt.figure(figsize=(15,8))
sns.heatmap(NUMS.corr(), annot=True, cmap='coolwarm', fmt='.2f')
#plt.savefig('corr.png')
<AxesSubplot: >
Memilih fitur yang tidak redundan, korelasi antar fitur < 0.5 dan korelasi dengan target > 0.2
NUMS=data[['height','humidity','windgust',
'humidity_18h','windgust_3h','cloudcover_3h',
'height_diff_18h']]
plt.figure(figsize=(8,4))
sns.heatmap(NUMS.corr(), annot=True, cmap='coolwarm', fmt='.2f')
#plt.savefig('corr.png')
<AxesSubplot: >
# Membagi fitur/prediktor dan label
X = data[['height','windgust','cloudcover_3h','humidity_18h','height_diff_18h']]
y = data[['height']]
X.head()
| height | windgust | cloudcover_3h | humidity_18h | height_diff_18h | |
|---|---|---|---|---|---|
| 0 | 42.240000 | 2.9 | 64.3 | 93.32 | 18.717250 |
| 1 | 42.139333 | 2.9 | 64.3 | 93.32 | 17.694119 |
| 2 | 41.734667 | 2.9 | 64.3 | 93.32 | 16.366988 |
| 3 | 41.364000 | 2.9 | 64.3 | 93.32 | 15.073857 |
| 4 | 41.330000 | 2.9 | 64.3 | 93.32 | 14.117393 |
y.head()
| height | |
|---|---|
| 0 | 42.240000 |
| 1 | 42.139333 |
| 2 | 41.734667 |
| 3 | 41.364000 |
| 4 | 41.330000 |
train | val | test
70% | 15% | 15%
# split data train|val|test
train_end = int(len(X)*0.7)
val_end = int(len(X)*0.85)
X_train = X[:train_end]
y_train = y[:train_end]
X_val = X[train_end:val_end]
y_val = y[train_end:val_end]
X_test = X[val_end:]
y_test = y[val_end:]
X_train.shape, y_train.shape, X_val.shape, y_val.shape, X_test.shape, y_test.shape
((9724, 5), (9724, 1), (2084, 5), (2084, 1), (2084, 5), (2084, 1))
# bentuk pembagian data
plt.figure(figsize=(15, 5))
plt.plot(y_train)
plt.plot(y_val)
plt.plot(y_test)
plt.title('Pembagian Dataset', fontweight='bold',fontsize= 14)
plt.axvline(X_train.shape[0], color='black', linestyle='--')
plt.axvline(X_train.shape[0]+X_val.shape[0], color='black', linestyle='--')
plt.legend(['Train', 'Validation', 'Test'])
<matplotlib.legend.Legend at 0x25b572f2fa0>
GRU bisa mengatasi data yang tidak terdistribusi normal, jadi hanya dilakukan scaling saja agar lebih mudah untuk mengevaluasi hasil prediksi model
# scaling/normalisasi data
'''
fit hanya pada data train
agar tidak terjadi data leakage/kebocoran data
'''
# X
scaler_X = MinMaxScaler().fit(X_train)
X_train = scaler_X.transform(X_train)
X_val = scaler_X.transform(X_val)
X_test = scaler_X.transform(X_test)
# y
scaler_y = MinMaxScaler().fit(y_train)
y_train = scaler_y.transform(y_train)
y_val = scaler_y.transform(y_val)
y_test = scaler_y.transform(y_test)
# reshape
y_train = y_train.reshape(-1)
y_val = y_val.reshape(-1)
y_test = y_test.reshape(-1)
X_train[:10]
array([[0.1125833 , 0.0859375 , 0.63866397, 0.87661618, 0.48567867],
[0.11202397, 0.0859375 , 0.63866397, 0.87661618, 0.48193766],
[0.10977557, 0.0859375 , 0.63866397, 0.87661618, 0.4770851 ],
[0.10771607, 0.0859375 , 0.63866397, 0.87661618, 0.47235685],
[0.10752716, 0.0859375 , 0.63866397, 0.87661618, 0.46885961],
[0.10696413, 0.0859375 , 0.63866397, 0.87661618, 0.46511616],
[0.10546766, 0.05859375, 0.38967611, 0.85519025, 0.46075843],
[0.10696784, 0.05859375, 0.38967611, 0.85519025, 0.45837274],
[0.10603069, 0.05859375, 0.38967611, 0.85519025, 0.45380951],
[0.10528246, 0.05859375, 0.38967611, 0.85519025, 0.44789097]])
y_train[:10]
array([0.1125833 , 0.11202397, 0.10977557, 0.10771607, 0.10752716,
0.10696413, 0.10546766, 0.10696784, 0.10603069, 0.10528246])
Save scaler:
import joblib
joblib.dump(scaler_X, 'scaler/scaler_X_prediksi.save')
joblib.dump(scaler_y, 'scaler/scaler_y_prediksi.save')
['scaler/scaler_y_prediksi.save']
# fungsi window
def create_window(data, window_size, future_size, label):
X_window = []
y_window = []
for i in range(len(data) - window_size - future_size):
X_window.append(data[i:i+window_size])
y_window.append(label[i+window_size:i+window_size+future_size])
return np.array(X_window), np.array(y_window)
# window (1)
window_size_1 = 144
future_size_1 = 36
X_train_window_1, y_train_window_1 = create_window(X_train, window_size_1, future_size_1, y_train)
X_val_window_1, y_val_window_1 = create_window(X_val, window_size_1, future_size_1, y_val)
X_test_window_1, y_test_window_1 = create_window(X_test, window_size_1, future_size_1, y_test)
print(X_train_window_1.shape, y_train_window_1.shape)
print(X_val_window_1.shape, y_val_window_1.shape)
print(X_test_window_1.shape, y_test_window_1.shape)
(9544, 144, 5) (9544, 36) (1904, 144, 5) (1904, 36) (1904, 144, 5) (1904, 36)
# window (2)
window_size_2 = 144
future_size_2 = 72
X_train_window_2, y_train_window_2 = create_window(X_train, window_size_2, future_size_2, y_train)
X_val_window_2, y_val_window_2 = create_window(X_val, window_size_2, future_size_2, y_val)
X_test_window_2, y_test_window_2 = create_window(X_test, window_size_2, future_size_2, y_test)
print(X_train_window_2.shape, y_train_window_2.shape)
print(X_val_window_2.shape, y_val_window_2.shape)
print(X_test_window_2.shape, y_test_window_2.shape)
(9508, 144, 5) (9508, 72) (1868, 144, 5) (1868, 72) (1868, 144, 5) (1868, 72)
# window (3)
window_size_3 = 288 #288
future_size_3 = 36
X_train_window_3, y_train_window_3 = create_window(X_train, window_size_3, future_size_3, y_train)
X_val_window_3, y_val_window_3 = create_window(X_val, window_size_3, future_size_3, y_val)
X_test_window_3, y_test_window_3 = create_window(X_test, window_size_3, future_size_3, y_test)
print(X_train_window_3.shape, y_train_window_3.shape)
print(X_val_window_3.shape, y_val_window_3.shape)
print(X_test_window_3.shape, y_test_window_3.shape)
(9400, 288, 5) (9400, 36) (1760, 288, 5) (1760, 36) (1760, 288, 5) (1760, 36)
# window (4)
window_size_4 = 288
future_size_4 = 72
X_train_window_4, y_train_window_4 = create_window(X_train, window_size_4, future_size_4, y_train)
X_val_window_4, y_val_window_4 = create_window(X_val, window_size_4, future_size_4, y_val)
X_test_window_4, y_test_window_4 = create_window(X_test, window_size_4, future_size_4, y_test)
print(X_train_window_4.shape, y_train_window_4.shape)
print(X_val_window_4.shape, y_val_window_4.shape)
print(X_test_window_4.shape, y_test_window_4.shape)
(9364, 288, 5) (9364, 72) (1724, 288, 5) (1724, 72) (1724, 288, 5) (1724, 72)
from tensorflow.keras.losses import MeanAbsoluteError, MeanSquaredError, MeanSquaredLogarithmicError
from tensorflow.keras.metrics import MeanAbsoluteError, MeanSquaredError, MeanSquaredLogarithmicError, RootMeanSquaredError
from tensorflow.keras.callbacks import LearningRateScheduler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dense
from tensorflow.keras.initializers import GlorotNormal, Zeros
# Define a learning rate schedule function
def lr_schedule(epoch):
if epoch <= 25:
return 0.0001
else:
return 0.00001
def plot_loss(history, model_name):
plt.figure(figsize = (6, 4))
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Train vs Validation (Loss) for ' + model_name, fontsize= 12)
plt.ylabel('Loss (MSE)')
plt.xlabel('Epoch')
plt.legend(['Train loss', 'Validation loss'], loc='upper right')
def plot_mae(history, model_name):
plt.figure(figsize = (6, 4))
plt.plot(history.history['mean_absolute_error'])
plt.plot(history.history['val_mean_absolute_error'])
plt.title('Train vs Validation (MAE) for ' + model_name, fontsize= 12)
plt.ylabel('MAE')
plt.xlabel('Epoch')
plt.legend(['Train mae', 'Validation mae'], loc='upper right')
def create_time_steps(length):
return list(range(-length, 0))
def plot_pred0(history, true_future, prediction):
plt.figure(figsize=(10, 4))
num_in = create_time_steps(len(history))
num_out = len(true_future)
plt.plot(num_in, np.array(history), 'k.', label='History')
plt.plot(np.arange(num_out), np.array(true_future), '.',
label='True Future')
if prediction.any():
plt.plot(np.arange(num_out), np.array(prediction), 'r.',
label='Predicted Future')
plt.title('Plot Prediction', fontsize= 12)
plt.legend(loc='upper left')
plt.show()
def plot_pred1(history, true_future, prediction):
plt.figure(figsize=(10, 4))
num_in = create_time_steps(len(history))
num_out = len(true_future)
plt.plot(num_in, np.array(history), 'k.', label='History')
plt.plot(np.arange(num_out), np.array(true_future), '.',
label='True Future')
if prediction.any():
plt.plot(np.arange(num_out), np.array(prediction), 'r.',
label='Predicted Future')
plt.title('Plot Prediction', fontsize= 12)
plt.legend(loc='upper left')
plt.ylim(0, 1) # Mengatur rentang sumbu y dari 0 hingga 1
plt.show()
def eval(model, X_train_window, X_val_window, X_test_window, y_train_window, y_val_window, y_test_window):
y_pred_train = model.predict(X_train_window)
y_pred_val = model.predict(X_val_window)
y_pred_test = model.predict(X_test_window)
print("Hasil evaluasi:")
print("MAE on train data: ",mean_absolute_error(y_train_window, y_pred_train).round(5))
print("MSE on train data: ", mean_squared_error(y_train_window, y_pred_train).round(5))
print("MAE on validation data: ",mean_absolute_error(y_val_window, y_pred_val).round(5))
print("MSE on validation data: ", mean_squared_error(y_val_window, y_pred_val).round(5))
print("MAE on test data: ", mean_absolute_error(y_test_window, y_pred_test).round(5))
print("MSE on test data: ", mean_squared_error(y_test_window, y_pred_test).round(5))
EPOCHS=20
PATIENCE=20
# Membuat model GRU
model_1 = Sequential()
model_1.add(GRU(window_size_1,
input_shape=(window_size_1, X_train.shape[1]),
return_sequences=False))
model_1.add(Dense(future_size_1))
model_1.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
gru (GRU) (None, 144) 65232
dense (Dense) (None, 36) 5220
=================================================================
Total params: 70,452
Trainable params: 70,452
Non-trainable params: 0
_________________________________________________________________
model_1.compile(loss=tf.keras.losses.MeanSquaredError(), # MeanSquaredError
optimizer=Adam(lr=0.0001),
metrics=[tf.keras.metrics.MeanAbsoluteError()]) # MeanAbsoluteError
early_stopping = EarlyStopping(monitor='val_loss',
patience=PATIENCE,
verbose=1,
restore_best_weights=True)
history_1 = model_1.fit(X_train_window_1, y_train_window_1,
validation_data=(X_val_window_1, y_val_window_1),
epochs=EPOCHS, batch_size=32,
callbacks=[early_stopping])
Epoch 1/20 299/299 [==============================] - 10s 24ms/step - loss: 0.0135 - mean_absolute_error: 0.0783 - val_loss: 0.0097 - val_mean_absolute_error: 0.0683 Epoch 2/20 299/299 [==============================] - 7s 22ms/step - loss: 0.0057 - mean_absolute_error: 0.0419 - val_loss: 0.0070 - val_mean_absolute_error: 0.0317 Epoch 3/20 299/299 [==============================] - 7s 22ms/step - loss: 0.0050 - mean_absolute_error: 0.0358 - val_loss: 0.0069 - val_mean_absolute_error: 0.0360 Epoch 4/20 299/299 [==============================] - 7s 22ms/step - loss: 0.0047 - mean_absolute_error: 0.0334 - val_loss: 0.0070 - val_mean_absolute_error: 0.0404 Epoch 5/20 299/299 [==============================] - 7s 22ms/step - loss: 0.0045 - mean_absolute_error: 0.0321 - val_loss: 0.0066 - val_mean_absolute_error: 0.0354 Epoch 6/20 299/299 [==============================] - 6s 22ms/step - loss: 0.0043 - mean_absolute_error: 0.0317 - val_loss: 0.0063 - val_mean_absolute_error: 0.0340 Epoch 7/20 299/299 [==============================] - 7s 22ms/step - loss: 0.0042 - mean_absolute_error: 0.0311 - val_loss: 0.0064 - val_mean_absolute_error: 0.0315 Epoch 8/20 299/299 [==============================] - 7s 22ms/step - loss: 0.0041 - mean_absolute_error: 0.0303 - val_loss: 0.0065 - val_mean_absolute_error: 0.0367 Epoch 9/20 299/299 [==============================] - 7s 22ms/step - loss: 0.0040 - mean_absolute_error: 0.0298 - val_loss: 0.0061 - val_mean_absolute_error: 0.0337 Epoch 10/20 299/299 [==============================] - 7s 22ms/step - loss: 0.0039 - mean_absolute_error: 0.0295 - val_loss: 0.0062 - val_mean_absolute_error: 0.0323 Epoch 11/20 299/299 [==============================] - 7s 22ms/step - loss: 0.0038 - mean_absolute_error: 0.0287 - val_loss: 0.0062 - val_mean_absolute_error: 0.0338 Epoch 12/20 299/299 [==============================] - 7s 22ms/step - loss: 0.0038 - mean_absolute_error: 0.0287 - val_loss: 0.0058 - val_mean_absolute_error: 0.0253 Epoch 13/20 299/299 [==============================] - 7s 22ms/step - loss: 0.0037 - mean_absolute_error: 0.0286 - val_loss: 0.0060 - val_mean_absolute_error: 0.0268 Epoch 14/20 299/299 [==============================] - 7s 22ms/step - loss: 0.0037 - mean_absolute_error: 0.0284 - val_loss: 0.0059 - val_mean_absolute_error: 0.0278 Epoch 15/20 299/299 [==============================] - 8s 26ms/step - loss: 0.0037 - mean_absolute_error: 0.0286 - val_loss: 0.0063 - val_mean_absolute_error: 0.0354 Epoch 16/20 299/299 [==============================] - 7s 24ms/step - loss: 0.0037 - mean_absolute_error: 0.0284 - val_loss: 0.0062 - val_mean_absolute_error: 0.0294 Epoch 17/20 299/299 [==============================] - 7s 24ms/step - loss: 0.0037 - mean_absolute_error: 0.0283 - val_loss: 0.0061 - val_mean_absolute_error: 0.0311 Epoch 18/20 299/299 [==============================] - 7s 23ms/step - loss: 0.0036 - mean_absolute_error: 0.0279 - val_loss: 0.0058 - val_mean_absolute_error: 0.0272 Epoch 19/20 299/299 [==============================] - 7s 23ms/step - loss: 0.0036 - mean_absolute_error: 0.0282 - val_loss: 0.0059 - val_mean_absolute_error: 0.0252 Epoch 20/20 299/299 [==============================] - 7s 23ms/step - loss: 0.0036 - mean_absolute_error: 0.0278 - val_loss: 0.0060 - val_mean_absolute_error: 0.0285
# Menampilkan plot hasil pelatihan dengan memanggil fungsi plot_loss
plot_loss(history_1, 'Model 1')
plot_mae(history_1, 'Model 1')
# menampilkan hasil evaluasi model
eval(model_1, X_train_window_1, X_val_window_1, X_test_window_1, y_train_window_1, y_val_window_1, y_test_window_1)
299/299 [==============================] - 3s 8ms/step 60/60 [==============================] - 0s 8ms/step 60/60 [==============================] - 0s 8ms/step Hasil evaluasi: MAE on train data: 0.02792 MSE on train data: 0.00358 MAE on validation data: 0.02853 MSE on validation data: 0.00596 MAE on test data: 0.0367 MSE on test data: 0.00609
# prediksi
for i in [795, 1001, 1016, 1085]:
plot_pred0(y_test_window_1[(i-window_size_1):i, 0],
y_test_window_1[i],
model_1.predict(X_test_window_1)[i])
60/60 [==============================] - 1s 10ms/step
60/60 [==============================] - 1s 11ms/step
60/60 [==============================] - 1s 11ms/step
60/60 [==============================] - 1s 10ms/step
# prediksi
for i in [795, 1001, 1016, 1085]:
plot_pred1(y_test_window_1[(i-window_size_1):i, 0],
y_test_window_1[i],
model_1.predict(X_test_window_1)[i])
60/60 [==============================] - 1s 10ms/step
60/60 [==============================] - 1s 10ms/step
60/60 [==============================] - 1s 10ms/step
60/60 [==============================] - 1s 10ms/step
# Membuat model GRU
model_2 = Sequential()
model_2.add(GRU(window_size_2,
input_shape=(window_size_2, X_train.shape[1]),
return_sequences=False))
model_2.add(Dense(future_size_2))
model_2.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
gru_1 (GRU) (None, 144) 65232
dense_1 (Dense) (None, 72) 10440
=================================================================
Total params: 75,672
Trainable params: 75,672
Non-trainable params: 0
_________________________________________________________________
model_2.compile(loss=tf.keras.losses.MeanSquaredError(),
optimizer=Adam(lr=0.0001),
metrics=[tf.keras.metrics.MeanAbsoluteError()])
early_stopping = EarlyStopping(monitor='val_loss',
patience=PATIENCE,
verbose=1,
restore_best_weights=True)
history_2 = model_2.fit(X_train_window_2, y_train_window_2,
validation_data=(X_val_window_2, y_val_window_2),
epochs=EPOCHS, batch_size=32,
callbacks=[early_stopping])
Epoch 1/20 298/298 [==============================] - 9s 25ms/step - loss: 0.0166 - mean_absolute_error: 0.0870 - val_loss: 0.0122 - val_mean_absolute_error: 0.0796 Epoch 2/20 298/298 [==============================] - 7s 23ms/step - loss: 0.0087 - mean_absolute_error: 0.0538 - val_loss: 0.0102 - val_mean_absolute_error: 0.0561 Epoch 3/20 298/298 [==============================] - 7s 23ms/step - loss: 0.0079 - mean_absolute_error: 0.0477 - val_loss: 0.0098 - val_mean_absolute_error: 0.0505 Epoch 4/20 298/298 [==============================] - 7s 22ms/step - loss: 0.0075 - mean_absolute_error: 0.0453 - val_loss: 0.0096 - val_mean_absolute_error: 0.0501 Epoch 5/20 298/298 [==============================] - 7s 22ms/step - loss: 0.0073 - mean_absolute_error: 0.0443 - val_loss: 0.0096 - val_mean_absolute_error: 0.0512 Epoch 6/20 298/298 [==============================] - 7s 22ms/step - loss: 0.0072 - mean_absolute_error: 0.0440 - val_loss: 0.0095 - val_mean_absolute_error: 0.0470 Epoch 7/20 298/298 [==============================] - 7s 22ms/step - loss: 0.0071 - mean_absolute_error: 0.0435 - val_loss: 0.0090 - val_mean_absolute_error: 0.0396 Epoch 8/20 298/298 [==============================] - 7s 23ms/step - loss: 0.0070 - mean_absolute_error: 0.0433 - val_loss: 0.0092 - val_mean_absolute_error: 0.0415 Epoch 9/20 298/298 [==============================] - 7s 24ms/step - loss: 0.0070 - mean_absolute_error: 0.0429 - val_loss: 0.0089 - val_mean_absolute_error: 0.0402 Epoch 10/20 298/298 [==============================] - 7s 23ms/step - loss: 0.0069 - mean_absolute_error: 0.0426 - val_loss: 0.0091 - val_mean_absolute_error: 0.0426 Epoch 11/20 298/298 [==============================] - 7s 25ms/step - loss: 0.0068 - mean_absolute_error: 0.0421 - val_loss: 0.0093 - val_mean_absolute_error: 0.0499 Epoch 12/20 298/298 [==============================] - 7s 23ms/step - loss: 0.0068 - mean_absolute_error: 0.0424 - val_loss: 0.0090 - val_mean_absolute_error: 0.0366 Epoch 13/20 298/298 [==============================] - 7s 23ms/step - loss: 0.0068 - mean_absolute_error: 0.0424 - val_loss: 0.0094 - val_mean_absolute_error: 0.0491 Epoch 14/20 298/298 [==============================] - 7s 24ms/step - loss: 0.0068 - mean_absolute_error: 0.0419 - val_loss: 0.0091 - val_mean_absolute_error: 0.0407 Epoch 15/20 298/298 [==============================] - 7s 23ms/step - loss: 0.0067 - mean_absolute_error: 0.0417 - val_loss: 0.0096 - val_mean_absolute_error: 0.0492 Epoch 16/20 298/298 [==============================] - 7s 23ms/step - loss: 0.0067 - mean_absolute_error: 0.0416 - val_loss: 0.0097 - val_mean_absolute_error: 0.0532 Epoch 17/20 298/298 [==============================] - 7s 23ms/step - loss: 0.0066 - mean_absolute_error: 0.0415 - val_loss: 0.0092 - val_mean_absolute_error: 0.0423 Epoch 18/20 298/298 [==============================] - 7s 23ms/step - loss: 0.0066 - mean_absolute_error: 0.0415 - val_loss: 0.0091 - val_mean_absolute_error: 0.0398 Epoch 19/20 298/298 [==============================] - 7s 23ms/step - loss: 0.0066 - mean_absolute_error: 0.0415 - val_loss: 0.0093 - val_mean_absolute_error: 0.0404 Epoch 20/20 298/298 [==============================] - 7s 23ms/step - loss: 0.0066 - mean_absolute_error: 0.0413 - val_loss: 0.0092 - val_mean_absolute_error: 0.0399
#Menampilkan plot hasil pelatihan dengan memanggil fungsi plot_loss
plot_loss(history_2, 'Model 2')
plot_mae(history_2, 'Model 2')
# menampilkan hasil evaluasi model
eval(model_2, X_train_window_2, X_val_window_2, X_test_window_2, y_train_window_2, y_val_window_2, y_test_window_2)
298/298 [==============================] - 2s 8ms/step 59/59 [==============================] - 0s 8ms/step 59/59 [==============================] - 0s 8ms/step Hasil evaluasi: MAE on train data: 0.03942 MSE on train data: 0.00651 MAE on validation data: 0.03991 MSE on validation data: 0.00915 MAE on test data: 0.05786 MSE on test data: 0.01311
# prediksi
for i in [791, 1003, 1015, 1070]:
plot_pred0(y_test_window_2[(i-window_size_2):i, 0],
y_test_window_2[i],
model_2.predict(X_test_window_2)[i])
59/59 [==============================] - 1s 10ms/step
59/59 [==============================] - 1s 10ms/step
59/59 [==============================] - 1s 10ms/step
59/59 [==============================] - 1s 10ms/step
# prediksi
for i in [791, 1003, 1015, 1070]:
plot_pred1(y_test_window_2[(i-window_size_2):i, 0],
y_test_window_2[i],
model_2.predict(X_test_window_2)[i])
59/59 [==============================] - 1s 12ms/step
59/59 [==============================] - 1s 10ms/step
59/59 [==============================] - 1s 10ms/step
59/59 [==============================] - 1s 10ms/step
# Membuat model GRU
model_3 = Sequential()
model_3.add(GRU(window_size_3,
input_shape=(window_size_3, X_train.shape[1]),
return_sequences=False))
model_3.add(Dense(future_size_3))
model_3.summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
gru_2 (GRU) (None, 288) 254880
dense_2 (Dense) (None, 36) 10404
=================================================================
Total params: 265,284
Trainable params: 265,284
Non-trainable params: 0
_________________________________________________________________
model_3.compile(loss=tf.keras.losses.MeanSquaredError(),
optimizer=Adam(lr=0.0001),
metrics=[tf.keras.metrics.MeanAbsoluteError()])
early_stopping = EarlyStopping(monitor='val_loss',
patience=PATIENCE,
verbose=1,
restore_best_weights=True)
history_3 = model_3.fit(X_train_window_3, y_train_window_3,
validation_data=(X_val_window_3, y_val_window_3),
epochs=EPOCHS, batch_size=32,
callbacks=[early_stopping])
Epoch 1/20 294/294 [==============================] - 27s 87ms/step - loss: 0.0105 - mean_absolute_error: 0.0638 - val_loss: 0.0076 - val_mean_absolute_error: 0.0402 Epoch 2/20 294/294 [==============================] - 27s 91ms/step - loss: 0.0050 - mean_absolute_error: 0.0360 - val_loss: 0.0073 - val_mean_absolute_error: 0.0342 Epoch 3/20 294/294 [==============================] - 28s 95ms/step - loss: 0.0045 - mean_absolute_error: 0.0324 - val_loss: 0.0069 - val_mean_absolute_error: 0.0270 Epoch 4/20 294/294 [==============================] - 28s 94ms/step - loss: 0.0043 - mean_absolute_error: 0.0316 - val_loss: 0.0067 - val_mean_absolute_error: 0.0340 Epoch 5/20 294/294 [==============================] - 25s 85ms/step - loss: 0.0041 - mean_absolute_error: 0.0302 - val_loss: 0.0063 - val_mean_absolute_error: 0.0283 Epoch 6/20 294/294 [==============================] - 27s 91ms/step - loss: 0.0040 - mean_absolute_error: 0.0301 - val_loss: 0.0066 - val_mean_absolute_error: 0.0296 Epoch 7/20 294/294 [==============================] - 26s 89ms/step - loss: 0.0040 - mean_absolute_error: 0.0293 - val_loss: 0.0064 - val_mean_absolute_error: 0.0251 Epoch 8/20 294/294 [==============================] - 25s 87ms/step - loss: 0.0039 - mean_absolute_error: 0.0289 - val_loss: 0.0066 - val_mean_absolute_error: 0.0342 Epoch 9/20 294/294 [==============================] - 27s 93ms/step - loss: 0.0039 - mean_absolute_error: 0.0289 - val_loss: 0.0064 - val_mean_absolute_error: 0.0291 Epoch 10/20 294/294 [==============================] - 28s 96ms/step - loss: 0.0038 - mean_absolute_error: 0.0285 - val_loss: 0.0063 - val_mean_absolute_error: 0.0266 Epoch 11/20 294/294 [==============================] - 29s 98ms/step - loss: 0.0038 - mean_absolute_error: 0.0287 - val_loss: 0.0062 - val_mean_absolute_error: 0.0263 Epoch 12/20 294/294 [==============================] - 27s 93ms/step - loss: 0.0038 - mean_absolute_error: 0.0281 - val_loss: 0.0064 - val_mean_absolute_error: 0.0349 Epoch 13/20 294/294 [==============================] - 28s 95ms/step - loss: 0.0037 - mean_absolute_error: 0.0287 - val_loss: 0.0062 - val_mean_absolute_error: 0.0253 Epoch 14/20 294/294 [==============================] - 27s 91ms/step - loss: 0.0037 - mean_absolute_error: 0.0281 - val_loss: 0.0064 - val_mean_absolute_error: 0.0297 Epoch 15/20 294/294 [==============================] - 29s 98ms/step - loss: 0.0037 - mean_absolute_error: 0.0280 - val_loss: 0.0064 - val_mean_absolute_error: 0.0323 Epoch 16/20 294/294 [==============================] - 25s 84ms/step - loss: 0.0037 - mean_absolute_error: 0.0283 - val_loss: 0.0065 - val_mean_absolute_error: 0.0268 Epoch 17/20 294/294 [==============================] - 29s 99ms/step - loss: 0.0036 - mean_absolute_error: 0.0281 - val_loss: 0.0063 - val_mean_absolute_error: 0.0309 Epoch 18/20 294/294 [==============================] - 28s 94ms/step - loss: 0.0036 - mean_absolute_error: 0.0279 - val_loss: 0.0061 - val_mean_absolute_error: 0.0285 Epoch 19/20 294/294 [==============================] - 27s 93ms/step - loss: 0.0036 - mean_absolute_error: 0.0277 - val_loss: 0.0060 - val_mean_absolute_error: 0.0249 Epoch 20/20 294/294 [==============================] - 27s 91ms/step - loss: 0.0036 - mean_absolute_error: 0.0276 - val_loss: 0.0060 - val_mean_absolute_error: 0.0252
#Menampilkan plot hasil pelatihan dengan memanggil fungsi plot_loss
plot_loss(history_3, 'Model 3')
plot_mae(history_3, 'Model 3')
# menampilkan hasil evaluasi model
eval(model_3, X_train_window_3, X_val_window_3, X_test_window_3, y_train_window_3, y_val_window_3, y_test_window_3)
294/294 [==============================] - 9s 29ms/step 55/55 [==============================] - 2s 29ms/step 55/55 [==============================] - 2s 29ms/step Hasil evaluasi: MAE on train data: 0.02495 MSE on train data: 0.00354 MAE on validation data: 0.0252 MSE on validation data: 0.00601 MAE on test data: 0.03635 MSE on test data: 0.00658
# prediksi
for i in [692,857,872,945]:
plot_pred0(y_test_window_3[(i-window_size_3):i, 0],
y_test_window_3[i],
model_3.predict(X_test_window_3)[i])
55/55 [==============================] - 2s 31ms/step
55/55 [==============================] - 2s 32ms/step
55/55 [==============================] - 2s 31ms/step
55/55 [==============================] - 2s 32ms/step
# prediksi
for i in [692,857,872,945]:
plot_pred1(y_test_window_3[(i-window_size_3):i, 0],
y_test_window_3[i],
model_3.predict(X_test_window_3)[i])
55/55 [==============================] - 2s 31ms/step
55/55 [==============================] - 2s 32ms/step
55/55 [==============================] - 2s 30ms/step
55/55 [==============================] - 2s 31ms/step
# Membuat model GRU
model_4 = Sequential()
model_4.add(GRU(window_size_4,
input_shape=(window_size_4, X_train.shape[1]),
return_sequences=False))
model_4.add(Dense(future_size_4))
model_4.summary()
Model: "sequential_3"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
gru_3 (GRU) (None, 288) 254880
dense_3 (Dense) (None, 72) 20808
=================================================================
Total params: 275,688
Trainable params: 275,688
Non-trainable params: 0
_________________________________________________________________
# Compile
model_4.compile(loss=tf.keras.losses.MeanSquaredError(),
optimizer=Adam(lr=0.0001),
metrics=[tf.keras.metrics.MeanAbsoluteError()])
early_stopping = EarlyStopping(monitor='val_loss',
patience=PATIENCE,
verbose=1,
restore_best_weights=True)
history_4 = model_4.fit(X_train_window_4, y_train_window_4,
validation_data=(X_val_window_4, y_val_window_4),
epochs=EPOCHS, batch_size=32,
callbacks=[early_stopping])
Epoch 1/20 293/293 [==============================] - 28s 87ms/step - loss: 0.0133 - mean_absolute_error: 0.0747 - val_loss: 0.0116 - val_mean_absolute_error: 0.0668 Epoch 2/20 293/293 [==============================] - 25s 85ms/step - loss: 0.0080 - mean_absolute_error: 0.0481 - val_loss: 0.0105 - val_mean_absolute_error: 0.0518 Epoch 3/20 293/293 [==============================] - 26s 88ms/step - loss: 0.0075 - mean_absolute_error: 0.0453 - val_loss: 0.0105 - val_mean_absolute_error: 0.0564 Epoch 4/20 293/293 [==============================] - 29s 99ms/step - loss: 0.0073 - mean_absolute_error: 0.0445 - val_loss: 0.0101 - val_mean_absolute_error: 0.0501 Epoch 5/20 293/293 [==============================] - 27s 93ms/step - loss: 0.0071 - mean_absolute_error: 0.0432 - val_loss: 0.0101 - val_mean_absolute_error: 0.0505 Epoch 6/20 293/293 [==============================] - 27s 91ms/step - loss: 0.0070 - mean_absolute_error: 0.0428 - val_loss: 0.0104 - val_mean_absolute_error: 0.0606 Epoch 7/20 293/293 [==============================] - 28s 96ms/step - loss: 0.0070 - mean_absolute_error: 0.0431 - val_loss: 0.0103 - val_mean_absolute_error: 0.0550 Epoch 8/20 293/293 [==============================] - 30s 102ms/step - loss: 0.0069 - mean_absolute_error: 0.0426 - val_loss: 0.0106 - val_mean_absolute_error: 0.0598 Epoch 9/20 293/293 [==============================] - 27s 93ms/step - loss: 0.0069 - mean_absolute_error: 0.0425 - val_loss: 0.0099 - val_mean_absolute_error: 0.0492 Epoch 10/20 293/293 [==============================] - 26s 90ms/step - loss: 0.0068 - mean_absolute_error: 0.0423 - val_loss: 0.0102 - val_mean_absolute_error: 0.0511 Epoch 11/20 293/293 [==============================] - 27s 91ms/step - loss: 0.0068 - mean_absolute_error: 0.0420 - val_loss: 0.0097 - val_mean_absolute_error: 0.0445 Epoch 12/20 293/293 [==============================] - 28s 97ms/step - loss: 0.0067 - mean_absolute_error: 0.0419 - val_loss: 0.0099 - val_mean_absolute_error: 0.0383 Epoch 13/20 293/293 [==============================] - 25s 84ms/step - loss: 0.0067 - mean_absolute_error: 0.0417 - val_loss: 0.0097 - val_mean_absolute_error: 0.0418 Epoch 14/20 293/293 [==============================] - 25s 85ms/step - loss: 0.0067 - mean_absolute_error: 0.0418 - val_loss: 0.0099 - val_mean_absolute_error: 0.0469 Epoch 15/20 293/293 [==============================] - 28s 96ms/step - loss: 0.0066 - mean_absolute_error: 0.0420 - val_loss: 0.0098 - val_mean_absolute_error: 0.0479 Epoch 16/20 293/293 [==============================] - 28s 95ms/step - loss: 0.0066 - mean_absolute_error: 0.0416 - val_loss: 0.0096 - val_mean_absolute_error: 0.0412 Epoch 17/20 293/293 [==============================] - 29s 99ms/step - loss: 0.0065 - mean_absolute_error: 0.0411 - val_loss: 0.0093 - val_mean_absolute_error: 0.0357 Epoch 18/20 293/293 [==============================] - 27s 91ms/step - loss: 0.0064 - mean_absolute_error: 0.0410 - val_loss: 0.0097 - val_mean_absolute_error: 0.0385 Epoch 19/20 293/293 [==============================] - 25s 85ms/step - loss: 0.0064 - mean_absolute_error: 0.0410 - val_loss: 0.0097 - val_mean_absolute_error: 0.0453 Epoch 20/20 293/293 [==============================] - 25s 84ms/step - loss: 0.0063 - mean_absolute_error: 0.0410 - val_loss: 0.0092 - val_mean_absolute_error: 0.0418
#Menampilkan plot hasil pelatihan dengan memanggil fungsi plot_loss
plot_loss(history_4, 'Model 4')
plot_mae(history_4, 'Model 4')
# menampilkan hasil evaluasi model
eval(model_4, X_train_window_4, X_val_window_4, X_test_window_4, y_train_window_4, y_val_window_4, y_test_window_4)
293/293 [==============================] - 9s 29ms/step 54/54 [==============================] - 2s 29ms/step 54/54 [==============================] - 2s 29ms/step Hasil evaluasi: MAE on train data: 0.04182 MSE on train data: 0.00638 MAE on validation data: 0.04182 MSE on validation data: 0.00918 MAE on test data: 0.07094 MSE on test data: 0.01612
# prediksi
for i in [697,859,870,925]:
plot_pred0(y_test_window_4[(i-window_size_4):i, 0],
y_test_window_4[i],
model_4.predict(X_test_window_4)[i])
54/54 [==============================] - 2s 30ms/step
54/54 [==============================] - 2s 29ms/step
54/54 [==============================] - 2s 30ms/step
54/54 [==============================] - 2s 29ms/step
# prediksi
for i in [697,859,870,925]:
plot_pred1(y_test_window_4[(i-window_size_4):i, 0],
y_test_window_4[i],
model_4.predict(X_test_window_4)[i])
54/54 [==============================] - 2s 30ms/step
54/54 [==============================] - 2s 30ms/step
54/54 [==============================] - 2s 29ms/step
54/54 [==============================] - 2s 31ms/step
MODEL dengan window (3) memiliki hasil evaluasi terbaik, selanjutnya akan dilakukan tuning hyperparameter untuk meningkatkan hasil evaluasi
# '''
# - penambahan layer gru dan dense ------------> agar dapat menangkap lebih banyak pola
# - aktivasi: tanh --------------------------> mengatasi vanishing gradient
# - kernel initializer: glorot normal --------> inisialisasi bobot
# - bias initializer: zeros -------------------> inisialisasi bias
# - penambahan epoch: 50 ---------------------> menambah iterasi training model
# - penyesuaian learning rate: lr scheduler ---> mengatur laju pembelajaran, biasanya makin lama iterasi diatur makin kecil
# '''
model_tuned = Sequential()
model_tuned.add(GRU(256,
input_shape=(window_size_3, X_train.shape[1]),
return_sequences=True,
kernel_initializer=GlorotNormal(),
bias_initializer=Zeros(),
activation='tanh'))
model_tuned.add(GRU(128,
activation='tanh',
return_sequences=True,
kernel_initializer=GlorotNormal(),
bias_initializer=Zeros()))
model_tuned.add(GRU(64,
activation='tanh',
return_sequences=True,
kernel_initializer=GlorotNormal(),
bias_initializer=Zeros()))
model_tuned.add(GRU(64,
activation='tanh',
return_sequences=False,
kernel_initializer=GlorotNormal(),
bias_initializer=Zeros()))
model_tuned.add(Dense(64,
activation='tanh',
kernel_initializer=GlorotNormal(),
bias_initializer=Zeros()))
model_tuned.add(Dense(future_size_3,
activation='tanh',
kernel_initializer=GlorotNormal(),
bias_initializer=Zeros()))
model_tuned.summary()
Model: "sequential_47"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
gru_137 (GRU) (None, 288, 256) 201984
gru_138 (GRU) (None, 288, 128) 148224
gru_139 (GRU) (None, 288, 64) 37248
gru_140 (GRU) (None, 64) 24960
dense_77 (Dense) (None, 64) 4160
dense_78 (Dense) (None, 36) 2340
=================================================================
Total params: 418,916
Trainable params: 418,916
Non-trainable params: 0
_________________________________________________________________
model_tuned.compile(loss=tf.keras.losses.MeanSquaredError(),
optimizer=Adam(lr=0.0001),
metrics=[tf.keras.metrics.MeanAbsoluteError()])
# Define learning rate scheduler callback
lr_scheduler = LearningRateScheduler(lr_schedule)
early_stopping = EarlyStopping(monitor='val_loss',
patience=50,
verbose=1,
restore_best_weights=True)
history_tuned = model_tuned.fit(X_train_window_3, y_train_window_3,
validation_data=(X_val_window_3, y_val_window_3),
epochs=50, batch_size=32,
callbacks=[early_stopping,lr_scheduler])
Epoch 1/50 294/294 [==============================] - 76s 243ms/step - loss: 0.0109 - mean_absolute_error: 0.0623 - val_loss: 0.0082 - val_mean_absolute_error: 0.0439 - lr: 1.0000e-04 Epoch 2/50 294/294 [==============================] - 47s 159ms/step - loss: 0.0054 - mean_absolute_error: 0.0369 - val_loss: 0.0073 - val_mean_absolute_error: 0.0318 - lr: 1.0000e-04 Epoch 3/50 294/294 [==============================] - 55s 189ms/step - loss: 0.0050 - mean_absolute_error: 0.0349 - val_loss: 0.0069 - val_mean_absolute_error: 0.0299 - lr: 1.0000e-04 Epoch 4/50 294/294 [==============================] - 51s 175ms/step - loss: 0.0046 - mean_absolute_error: 0.0328 - val_loss: 0.0066 - val_mean_absolute_error: 0.0303 - lr: 1.0000e-04 Epoch 5/50 294/294 [==============================] - 48s 163ms/step - loss: 0.0043 - mean_absolute_error: 0.0317 - val_loss: 0.0063 - val_mean_absolute_error: 0.0318 - lr: 1.0000e-04 Epoch 6/50 294/294 [==============================] - 48s 163ms/step - loss: 0.0041 - mean_absolute_error: 0.0309 - val_loss: 0.0062 - val_mean_absolute_error: 0.0264 - lr: 1.0000e-04 Epoch 7/50 294/294 [==============================] - 48s 165ms/step - loss: 0.0039 - mean_absolute_error: 0.0302 - val_loss: 0.0060 - val_mean_absolute_error: 0.0269 - lr: 1.0000e-04 Epoch 8/50 294/294 [==============================] - 49s 167ms/step - loss: 0.0038 - mean_absolute_error: 0.0292 - val_loss: 0.0059 - val_mean_absolute_error: 0.0264 - lr: 1.0000e-04 Epoch 9/50 294/294 [==============================] - 49s 166ms/step - loss: 0.0037 - mean_absolute_error: 0.0295 - val_loss: 0.0059 - val_mean_absolute_error: 0.0311 - lr: 1.0000e-04 Epoch 10/50 294/294 [==============================] - 49s 166ms/step - loss: 0.0037 - mean_absolute_error: 0.0295 - val_loss: 0.0060 - val_mean_absolute_error: 0.0310 - lr: 1.0000e-04 Epoch 11/50 294/294 [==============================] - 49s 166ms/step - loss: 0.0036 - mean_absolute_error: 0.0289 - val_loss: 0.0058 - val_mean_absolute_error: 0.0284 - lr: 1.0000e-04 Epoch 12/50 294/294 [==============================] - 49s 166ms/step - loss: 0.0036 - mean_absolute_error: 0.0284 - val_loss: 0.0064 - val_mean_absolute_error: 0.0348 - lr: 1.0000e-04 Epoch 13/50 294/294 [==============================] - 49s 166ms/step - loss: 0.0036 - mean_absolute_error: 0.0285 - val_loss: 0.0058 - val_mean_absolute_error: 0.0302 - lr: 1.0000e-04 Epoch 14/50 294/294 [==============================] - 49s 166ms/step - loss: 0.0035 - mean_absolute_error: 0.0284 - val_loss: 0.0060 - val_mean_absolute_error: 0.0327 - lr: 1.0000e-04 Epoch 15/50 294/294 [==============================] - 49s 166ms/step - loss: 0.0035 - mean_absolute_error: 0.0286 - val_loss: 0.0058 - val_mean_absolute_error: 0.0261 - lr: 1.0000e-04 Epoch 16/50 294/294 [==============================] - 53s 182ms/step - loss: 0.0035 - mean_absolute_error: 0.0278 - val_loss: 0.0062 - val_mean_absolute_error: 0.0282 - lr: 1.0000e-04 Epoch 17/50 294/294 [==============================] - 49s 166ms/step - loss: 0.0034 - mean_absolute_error: 0.0272 - val_loss: 0.0060 - val_mean_absolute_error: 0.0254 - lr: 1.0000e-04 Epoch 18/50 294/294 [==============================] - 49s 166ms/step - loss: 0.0034 - mean_absolute_error: 0.0274 - val_loss: 0.0059 - val_mean_absolute_error: 0.0280 - lr: 1.0000e-04 Epoch 19/50 294/294 [==============================] - 49s 166ms/step - loss: 0.0034 - mean_absolute_error: 0.0271 - val_loss: 0.0062 - val_mean_absolute_error: 0.0329 - lr: 1.0000e-04 Epoch 20/50 294/294 [==============================] - 49s 166ms/step - loss: 0.0034 - mean_absolute_error: 0.0276 - val_loss: 0.0057 - val_mean_absolute_error: 0.0270 - lr: 1.0000e-04 Epoch 21/50 294/294 [==============================] - 49s 166ms/step - loss: 0.0033 - mean_absolute_error: 0.0271 - val_loss: 0.0057 - val_mean_absolute_error: 0.0258 - lr: 1.0000e-04 Epoch 22/50 294/294 [==============================] - 49s 166ms/step - loss: 0.0034 - mean_absolute_error: 0.0274 - val_loss: 0.0059 - val_mean_absolute_error: 0.0253 - lr: 1.0000e-04 Epoch 23/50 294/294 [==============================] - 49s 168ms/step - loss: 0.0034 - mean_absolute_error: 0.0279 - val_loss: 0.0059 - val_mean_absolute_error: 0.0252 - lr: 1.0000e-04 Epoch 24/50 294/294 [==============================] - 50s 169ms/step - loss: 0.0033 - mean_absolute_error: 0.0264 - val_loss: 0.0057 - val_mean_absolute_error: 0.0236 - lr: 1.0000e-04 Epoch 25/50 294/294 [==============================] - 53s 181ms/step - loss: 0.0033 - mean_absolute_error: 0.0271 - val_loss: 0.0057 - val_mean_absolute_error: 0.0241 - lr: 1.0000e-04 Epoch 26/50 294/294 [==============================] - 49s 166ms/step - loss: 0.0032 - mean_absolute_error: 0.0264 - val_loss: 0.0061 - val_mean_absolute_error: 0.0309 - lr: 1.0000e-04 Epoch 27/50 294/294 [==============================] - 49s 167ms/step - loss: 0.0032 - mean_absolute_error: 0.0253 - val_loss: 0.0058 - val_mean_absolute_error: 0.0275 - lr: 1.0000e-05 Epoch 28/50 294/294 [==============================] - 49s 166ms/step - loss: 0.0031 - mean_absolute_error: 0.0251 - val_loss: 0.0058 - val_mean_absolute_error: 0.0260 - lr: 1.0000e-05 Epoch 29/50 294/294 [==============================] - 49s 166ms/step - loss: 0.0031 - mean_absolute_error: 0.0250 - val_loss: 0.0058 - val_mean_absolute_error: 0.0259 - lr: 1.0000e-05 Epoch 30/50 294/294 [==============================] - 49s 167ms/step - loss: 0.0031 - mean_absolute_error: 0.0250 - val_loss: 0.0057 - val_mean_absolute_error: 0.0263 - lr: 1.0000e-05 Epoch 31/50 294/294 [==============================] - 50s 169ms/step - loss: 0.0031 - mean_absolute_error: 0.0251 - val_loss: 0.0058 - val_mean_absolute_error: 0.0266 - lr: 1.0000e-05 Epoch 32/50 294/294 [==============================] - 50s 170ms/step - loss: 0.0031 - mean_absolute_error: 0.0249 - val_loss: 0.0059 - val_mean_absolute_error: 0.0284 - lr: 1.0000e-05 Epoch 33/50 294/294 [==============================] - 50s 169ms/step - loss: 0.0031 - mean_absolute_error: 0.0251 - val_loss: 0.0059 - val_mean_absolute_error: 0.0252 - lr: 1.0000e-05 Epoch 34/50 294/294 [==============================] - 50s 169ms/step - loss: 0.0031 - mean_absolute_error: 0.0250 - val_loss: 0.0058 - val_mean_absolute_error: 0.0259 - lr: 1.0000e-05 Epoch 35/50 294/294 [==============================] - 50s 170ms/step - loss: 0.0031 - mean_absolute_error: 0.0250 - val_loss: 0.0059 - val_mean_absolute_error: 0.0279 - lr: 1.0000e-05 Epoch 36/50 294/294 [==============================] - 52s 178ms/step - loss: 0.0031 - mean_absolute_error: 0.0250 - val_loss: 0.0058 - val_mean_absolute_error: 0.0258 - lr: 1.0000e-05 Epoch 37/50 294/294 [==============================] - 49s 168ms/step - loss: 0.0031 - mean_absolute_error: 0.0250 - val_loss: 0.0058 - val_mean_absolute_error: 0.0258 - lr: 1.0000e-05 Epoch 38/50 294/294 [==============================] - 54s 185ms/step - loss: 0.0031 - mean_absolute_error: 0.0249 - val_loss: 0.0058 - val_mean_absolute_error: 0.0275 - lr: 1.0000e-05 Epoch 39/50 294/294 [==============================] - 50s 169ms/step - loss: 0.0031 - mean_absolute_error: 0.0250 - val_loss: 0.0059 - val_mean_absolute_error: 0.0275 - lr: 1.0000e-05 Epoch 40/50 294/294 [==============================] - 50s 169ms/step - loss: 0.0031 - mean_absolute_error: 0.0250 - val_loss: 0.0058 - val_mean_absolute_error: 0.0253 - lr: 1.0000e-05 Epoch 41/50 294/294 [==============================] - 50s 169ms/step - loss: 0.0031 - mean_absolute_error: 0.0249 - val_loss: 0.0058 - val_mean_absolute_error: 0.0271 - lr: 1.0000e-05 Epoch 42/50 294/294 [==============================] - 50s 169ms/step - loss: 0.0031 - mean_absolute_error: 0.0250 - val_loss: 0.0058 - val_mean_absolute_error: 0.0256 - lr: 1.0000e-05 Epoch 43/50 294/294 [==============================] - 50s 170ms/step - loss: 0.0031 - mean_absolute_error: 0.0250 - val_loss: 0.0058 - val_mean_absolute_error: 0.0270 - lr: 1.0000e-05 Epoch 44/50 294/294 [==============================] - 50s 169ms/step - loss: 0.0031 - mean_absolute_error: 0.0250 - val_loss: 0.0057 - val_mean_absolute_error: 0.0256 - lr: 1.0000e-05 Epoch 45/50 294/294 [==============================] - 50s 169ms/step - loss: 0.0031 - mean_absolute_error: 0.0250 - val_loss: 0.0059 - val_mean_absolute_error: 0.0253 - lr: 1.0000e-05 Epoch 46/50 294/294 [==============================] - 50s 169ms/step - loss: 0.0031 - mean_absolute_error: 0.0249 - val_loss: 0.0058 - val_mean_absolute_error: 0.0266 - lr: 1.0000e-05 Epoch 47/50 294/294 [==============================] - 50s 169ms/step - loss: 0.0031 - mean_absolute_error: 0.0249 - val_loss: 0.0058 - val_mean_absolute_error: 0.0273 - lr: 1.0000e-05 Epoch 48/50 294/294 [==============================] - 50s 169ms/step - loss: 0.0031 - mean_absolute_error: 0.0249 - val_loss: 0.0059 - val_mean_absolute_error: 0.0289 - lr: 1.0000e-05 Epoch 49/50 294/294 [==============================] - 50s 169ms/step - loss: 0.0031 - mean_absolute_error: 0.0249 - val_loss: 0.0058 - val_mean_absolute_error: 0.0275 - lr: 1.0000e-05 Epoch 50/50 294/294 [==============================] - 57s 195ms/step - loss: 0.0031 - mean_absolute_error: 0.0250 - val_loss: 0.0058 - val_mean_absolute_error: 0.0268 - lr: 1.0000e-05
#Menampilkan plot hasil pelatihan dengan memanggil fungsi plot_loss
plot_loss(history_tuned, 'Model tuned')
plot_mae(history_tuned, 'Model tuned')
# menampilkan hasil evaluasi model
eval(model_tuned, X_train_window_3, X_val_window_3, X_test_window_3, y_train_window_3, y_val_window_3, y_test_window_3)
294/294 [==============================] - 90s 287ms/step 55/55 [==============================] - 16s 291ms/step 55/55 [==============================] - 16s 290ms/step Hasil evaluasi: MAE on train data: 0.02501 MSE on train data: 0.00307 MAE on validation data: 0.0268 MSE on validation data: 0.0058 MAE on test data: 0.03543 MSE on test data: 0.00586
# prediksi
for i in [692,857, 872, 945]:
plot_pred0(y_test_window_3[(i-window_size_3):i, 0],
y_test_window_3[i],
model_tuned.predict(X_test_window_3)[i])
55/55 [==============================] - 3s 57ms/step
55/55 [==============================] - 3s 60ms/step
55/55 [==============================] - 3s 61ms/step
55/55 [==============================] - 3s 60ms/step
# prediksi
for i in [692,857,872,945]:
plot_pred1(y_test_window_3[(i-window_size_3):i, 0],
y_test_window_3[i],
model_tuned.predict(X_test_window_3)[i])
55/55 [==============================] - 3s 60ms/step
55/55 [==============================] - 3s 62ms/step
55/55 [==============================] - 3s 62ms/step
55/55 [==============================] - 3s 61ms/step
Performa meningkat setelah tuning hyperparameter (dapat memprediksi puncak lebih akurat)
# # save model dalam format .h5
# model_final.save('/content/drive/MyDrive/dataset_skripsi/model_banjir/h5/model_prediksi_banjir.h5')
# save model dalam format .h5
model_tuned.save('model/model_prediksi_banjir.h5')
# Memuat model dari format HDF5
from tensorflow.keras.models import load_model
loaded_model = load_model('model/model_prediksi_banjir.h5')
# menampilkan hasil evaluasi model
eval(loaded_model, X_train_window_3, X_val_window_3, X_test_window_3, y_train_window_3, y_val_window_3, y_test_window_3)
294/294 [==============================] - 19s 58ms/step 55/55 [==============================] - 3s 58ms/step 55/55 [==============================] - 3s 58ms/step Hasil evaluasi: MAE on train data: 0.02501 MSE on train data: 0.00307 MAE on validation data: 0.0268 MSE on validation data: 0.0058 MAE on test data: 0.03543 MSE on test data: 0.00586
# import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import joblib
from keras.models import load_model
def get_X_klasifikasi(data, date):
X_klasifikasi = data.loc[data['date'] == date].reset_index(drop=True)
X_klasifikasi = X_klasifikasi[['date','height']]
return X_klasifikasi
def klasifikasi_banjir(X,scaler_X,model):
# scaling
X_klasifikasi_scaled = scaler_X.transform(X[['height']])
# predict
y_klasifikasi = model.predict(X_klasifikasi_scaled,verbose=0)
y_klasifikasi = np.argmax(y_klasifikasi, axis=1)
# df
y_klasifikasi = pd.DataFrame(y_klasifikasi,columns = ['status'])
df_klasifikasi = X.join(y_klasifikasi)
return df_klasifikasi
def get_X_prediksi(data, date):
data_history = data.loc[data['date'] <= date].head(500).sort_values(by=['date']).reset_index(drop=True) # 288+108=396, 400>396
data_history['cloudcover_3h'] = data_history['cloudcover'].shift(18)
data_history['humidity_18h'] = data_history['humidity'].shift(108)
data_history['height_diff_18h'] = data_history['height'] - data_history['height'].shift(108)
data_history = data_history.dropna().tail(288).reset_index(drop = True)
data_history = data_history[['date','height','windgust','cloudcover_3h','humidity_18h','height_diff_18h']]
return data_history
def prediksi_banjir(data, date, X, scaler_X, scaler_y, model):
# X
X_prediksi = X[['height','windgust','cloudcover_3h','humidity_18h','height_diff_18h']]
# scaling
X_prediksi_scaled = scaler_X.transform(X_prediksi)
# reshape
X_prediksi_scaled = X_prediksi_scaled.reshape(1,288,5)
# predict
y_prediksi = model.predict(X_prediksi_scaled,verbose=0)
# reshape
y_prediksi = y_prediksi.reshape(36,1)
# inverse scaling
y_prediksi_inverse = scaler_y.inverse_transform(y_prediksi)
y_prediksi_inverse = pd.DataFrame(y_prediksi_inverse, columns = ['height'])
# DATA FUTURE
data_future = data.loc[data['date'] > date].tail(36).sort_values(by=['date']).reset_index(drop=True)# 36=step
data_future = data_future[['date','height']].rename(columns = {'height':'height_true'})
# DF PRED
df_pred = data_future.join(y_prediksi_inverse)
return df_pred
def get_info(y_klasifikasi, X, y_pred_status):
date = y_klasifikasi['date'][0]
height = y_klasifikasi['height'][0]
status = y_klasifikasi['status'][0]
# Info klasifikasi
print('-----------------------------------------------------------------------------')
print('Datetime :', date)
print('Ketinggian sekarang :', height.round(2), 'cm')
# kondisi status
if status == 0:
print('Status sekarang : SIAGA 0\n')
if (y_pred_status['status']==0).all(): # jika semua siaga 0
print('Info : [AMAN] Dalam 6 jam kedepan diperkirakan akan tetap berstatus SIAGA 0.')
print(' Tidak akan terjadi banjir.')
elif (y_pred_status['status'] == 1).any() and not (y_pred_status['status'] == 2).any(): # jika ada siaga 1 dan tidak ada siaga 2
t_siaga1_start = (y_pred_status[y_pred_status['status'] == 1].index.min()+1) * 10
print(f'Info : [WASPADA] Dalam {t_siaga1_start} menit kedepan diperkirakan akan berstatus SIAGA 1.')
print(' Harap pantau ketinggian air secara berkala.')
elif (y_pred_status['status'] == 2).any(): # jika ada siaga 2
t_siaga2_start = (y_pred_status[y_pred_status['status'] == 2].index.min()+1) * 10
print(f'Info : [BAHAYA] Dalam {t_siaga2_start} menit kedepan diperkirakan akan berstatus SIAGA 2.')
print(' Berkemungkinan terjadi banjir, segera lakukan evakuasi.')
else: print('Info : -')
elif status == 1:
print('Status sekarang : SIAGA 1\n')
if (y_pred_status['status']==0).all():
print('Info : [AMAN] Dalam 10 menit kedepan diperkirakan akan berstatus SIAGA 0.')
print(' Tidak akan terjadi banjir.')
elif (y_pred_status['status']==0).any() and not (y_pred_status['status'] == 2).any():
t_siaga1_end = (y_pred_status[y_pred_status['status'] == 1].index.max()+2) * 10
print(f'Info : [AMAN] Dalam {t_siaga1_end} menit kedepan diperkirakan akan berstatus SIAGA 0.')
print(' Tidak akan terjadi banjir.')
elif (y_pred_status['status']==1).all():
print('Info : [WASPADA] Dalam 6 jam kedepan diperkirakan akan tetap berstatus SIAGA 1.')
print(' Harap pantau ketinggian air secara berkala.')
elif (y_pred_status['status']==2).any():
t_siaga2_start = (y_pred_status[y_pred_status['status'] == 2].index.min()+1) * 10
print(f'Info : [BAHAYA] Dalam {t_siaga2_start} menit kedepan diperkirakan akan berstatus SIAGA 2.')
print(' Berkemungkinan terjadi banjir, segera lakukan evakuasi.')
else: print('Info : -')
elif status == 2:
print('Status sekarang : SIAGA 2\n')
if not (y_pred_status['status']==2).any():
print('Info : [WASPADA] Dalam 10 menit kedepan diperkirakan status SIAGA 2 akan berakhir.')
print(' Harap pantau ketinggian air secara berkala.')
elif (y_pred_status['status']==2).any() and not (y_pred_status['status']==2).all():
t_siaga2_end = (y_pred_status[y_pred_status['status'] == 2].index.max()+2) * 10
print(f'Info : [BAHAYA] Dalam {t_siaga2_end} menit kedepan diperkirakan masih berstatus SIAGA 2.')
print(' Berkemungkinan terjadi banjir, segera lakukan evakuasi.')
elif (y_pred_status['status']==2).all():
print('Info : [BAHAYA] Dalam 6 jam kedepan diperkirakan akan tetap berstatus SIAGA 2.')
print(' Berkemungkinan terjadi banjir, segera lakukan evakuasi.')
else: print('Info : -')
print('-----------------------------------------------------------------------------')
# Plot
history = X.reset_index()
history['index'] = history['index'] - 287
history = history.set_index('index')
fig = plt.figure(figsize=(10, 4))
fig = plt.plot(history['height'], 'k.', label='history')
fig = plt.plot(history['height'].tail(1), 'yo', label='now')
fig = plt.plot(y_pred_status['height_true'], '.', label='height true')
fig = plt.plot(y_pred_status['height'], 'r.', label='height pred')
fig = plt.ylabel('Height (cm)')
fig = plt.xlabel('Step')
fig = plt.title('Plot Prediksi Banjir 6 jam kedepan', fontweight='bold', fontsize=12)
fig = plt.legend(loc='upper left')
display('Plot Prediksi :', fig)
# load scaler & model
scaler_X_klasifikasi = joblib.load('scaler/scaler_X_klasifikasi.save')
model_klasifikasi_banjir = load_model('model/model_klasifikasi_banjir.h5')
scaler_X_prediksi = joblib.load('scaler/scaler_X_prediksi.save')
scaler_y_prediksi = joblib.load('scaler/scaler_y_prediksi.save')
model_prediksi_banjir = load_model('model/model_prediksi_banjir.h5')
# read data and format date
data_simulasi = pd.read_csv('dataset/data_simulasi_banjir_sorted.csv')
data_simulasi['date'] = data_simulasi['date'] + ':00'
data_simulasi['date'] = pd.to_datetime(data_simulasi['date'], format='%d/%m/%Y %H:%M:%S')
data_simulasi
| date | height | temp | feelslike | dew | humidity | precip | precipprob | windgust | windspeed | winddir | sealevelpressure | cloudcover | visibility | solarradiation | uvindex | severerisk | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2022-09-30 10:30:00 | 66.708667 | 26.8 | 28.1 | 19.6 | 64.70 | 0.0 | 0 | 8.6 | 5.4 | 292.1 | 1011 | 100.0 | 24.1 | 756 | 8 | 10 |
| 1 | 2022-09-30 10:20:00 | 66.912000 | 26.8 | 28.1 | 19.6 | 64.70 | 0.0 | 0 | 8.6 | 5.4 | 292.1 | 1011 | 100.0 | 24.1 | 756 | 8 | 10 |
| 2 | 2022-09-30 10:10:00 | 66.713333 | 26.8 | 28.1 | 19.6 | 64.70 | 0.0 | 0 | 8.6 | 5.4 | 292.1 | 1011 | 100.0 | 24.1 | 756 | 8 | 10 |
| 3 | 2022-09-30 10:00:00 | 66.380667 | 26.8 | 28.1 | 19.6 | 64.70 | 0.0 | 0 | 8.6 | 5.4 | 292.1 | 1011 | 100.0 | 24.1 | 756 | 8 | 10 |
| 4 | 2022-09-30 09:50:00 | 66.340000 | 25.4 | 25.4 | 19.4 | 69.42 | 0.0 | 0 | 6.5 | 3.6 | 314.2 | 1012 | 86.5 | 24.1 | 545 | 5 | 10 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3195 | 2022-09-08 06:00:00 | 59.662667 | 18.8 | 18.8 | 16.6 | 87.05 | 0.0 | 0 | 4.0 | 5.4 | 77.1 | 1013 | 92.8 | 24.1 | 0 | 0 | 10 |
| 3196 | 2022-09-08 05:50:00 | 59.910667 | 19.0 | 19.0 | 17.3 | 89.87 | 0.0 | 0 | 4.7 | 6.5 | 81.5 | 1013 | 100.0 | 24.1 | 0 | 0 | 10 |
| 3197 | 2022-09-08 05:40:00 | 60.045333 | 19.0 | 19.0 | 17.3 | 89.87 | 0.0 | 0 | 4.7 | 6.5 | 81.5 | 1013 | 100.0 | 24.1 | 0 | 0 | 10 |
| 3198 | 2022-09-08 05:30:00 | 60.056000 | 19.0 | 19.0 | 17.3 | 89.87 | 0.0 | 0 | 4.7 | 6.5 | 81.5 | 1013 | 100.0 | 24.1 | 0 | 0 | 10 |
| 3199 | 2022-09-08 05:20:00 | 59.976667 | 19.0 | 19.0 | 17.3 | 89.87 | 0.0 | 0 | 4.7 | 6.5 | 81.5 | 1013 | 100.0 | 24.1 | 0 | 0 | 10 |
3200 rows × 17 columns
data_simulasi[['height']].describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| height | 3200.0 | 66.393319 | 25.277944 | 33.896667 | 49.694 | 58.780333 | 73.725833 | 197.556 |
Dalam dataset simulasi ini terdapat nilai 'height' antara 33.90 cm - 197.56 cm, jadi sudah cukup untuk melakukan simulasi klasifikasi dan prediksi banjir karena terdapat status siaga 0, siaga 1, dan siaga 3.
# cek datetime
data_cek=data_simulasi.sort_values(by='date',ascending=True).reset_index(drop=True)
a=1800
plt.figure(figsize=(15, 3))
plt.plot(data_cek['height'], '-')
plt.axvline(a-500, color='black', linestyle='--')
plt.axvline(a, color='black', linestyle='--')
plt.axvline(a+36, color='black', linestyle='--')
plt.axhline(100, color='grey', linestyle='dotted')
plt.axhline(150, color='grey', linestyle='dotted')
plt.show()
data_cek.date.loc[a-1]
Timestamp('2022-09-20 17:10:00')
#date_now = input('date: ')
date_now = '2022-09-17 17:10:00'
# klasifikasi
X_klasifikasi=get_X_klasifikasi(data=data_simulasi,
date=date_now)
y_klasifikasi=klasifikasi_banjir(X=X_klasifikasi,
scaler_X=scaler_X_klasifikasi,
model=model_klasifikasi_banjir)
# prediksi
X_prediksi=get_X_prediksi(data=data_simulasi,
date=date_now)
y_pred=prediksi_banjir(data=data_simulasi,
date=date_now,
X=X_prediksi,
scaler_X=scaler_X_prediksi,
scaler_y=scaler_y_prediksi,
model=model_prediksi_banjir)
pred_status=klasifikasi_banjir(X=y_pred,
scaler_X=scaler_X_klasifikasi,
model=model_klasifikasi_banjir)
get_info(y_klasifikasi=y_klasifikasi,
X=X_prediksi,
y_pred_status=pred_status)
-----------------------------------------------------------------------------
Datetime : 2022-09-17 17:10:00
Ketinggian sekarang : 67.17 cm
Status sekarang : SIAGA 0
Info : [AMAN] Dalam 6 jam kedepan diperkirakan akan tetap berstatus SIAGA 0.
Tidak akan terjadi banjir.
-----------------------------------------------------------------------------
'Plot Prediksi :'
<matplotlib.legend.Legend at 0x25a22eeed90>
#date_now
date_now = '2022-09-20 17:50:00'
# klasifikasi
X_klasifikasi=get_X_klasifikasi(data=data_simulasi,
date=date_now)
y_klasifikasi=klasifikasi_banjir(X=X_klasifikasi,
scaler_X=scaler_X_klasifikasi,
model=model_klasifikasi_banjir)
# prediksi
X_prediksi=get_X_prediksi(data=data_simulasi,
date=date_now)
y_pred=prediksi_banjir(data=data_simulasi,
date=date_now,
X=X_prediksi,
scaler_X=scaler_X_prediksi,
scaler_y=scaler_y_prediksi,
model=model_prediksi_banjir)
pred_status=klasifikasi_banjir(X=y_pred,
scaler_X=scaler_X_klasifikasi,
model=model_klasifikasi_banjir)
get_info(y_klasifikasi=y_klasifikasi,
X=X_prediksi,
y_pred_status=pred_status)
-----------------------------------------------------------------------------
Datetime : 2022-09-20 17:50:00
Ketinggian sekarang : 77.0 cm
Status sekarang : SIAGA 0
Info : [WASPADA] Dalam 60 menit kedepan diperkirakan akan berstatus SIAGA 1.
Harap pantau ketinggian air secara berkala.
-----------------------------------------------------------------------------
'Plot Prediksi :'
<matplotlib.legend.Legend at 0x259f1ac0d30>
#date_now = input('date: ')
date_now = '2022-09-27 18:30:00'
# klasifikasi
X_klasifikasi=get_X_klasifikasi(data=data_simulasi,
date=date_now)
y_klasifikasi=klasifikasi_banjir(X=X_klasifikasi,
scaler_X=scaler_X_klasifikasi,
model=model_klasifikasi_banjir)
# prediksi
X_prediksi=get_X_prediksi(data=data_simulasi,
date=date_now)
y_pred=prediksi_banjir(data=data_simulasi,
date=date_now,
X=X_prediksi,
scaler_X=scaler_X_prediksi,
scaler_y=scaler_y_prediksi,
model=model_prediksi_banjir)
pred_status=klasifikasi_banjir(X=y_pred,
scaler_X=scaler_X_klasifikasi,
model=model_klasifikasi_banjir)
get_info(y_klasifikasi=y_klasifikasi,
X=X_prediksi,
y_pred_status=pred_status)
-----------------------------------------------------------------------------
Datetime : 2022-09-27 18:30:00
Ketinggian sekarang : 103.26 cm
Status sekarang : SIAGA 1
Info : [BAHAYA] Dalam 40 menit kedepan diperkirakan akan berstatus SIAGA 2.
Berkemungkinan terjadi banjir, segera lakukan evakuasi.
-----------------------------------------------------------------------------
'Plot Prediksi :'
<matplotlib.legend.Legend at 0x25a1d9eb760>
# data simulasi
df=data_simulasi[['height']].copy()
# define aman=0, siaga 1=1, siaga 2=2
df['status'] = np.where(df['height'] <= 100, 0,
np.where(df['height'] <= 150, 1,
2))
# Membagi fitur dan label
X = df[['height']]
y = df[['status']]
# scaling
scaler_X = scaler_X_klasifikasi
X_scaled = scaler_X.transform(X)
y.value_counts()
status 0 2882 1 276 2 42 dtype: int64
# Evaluasi model
simulasi_scores = model_klasifikasi_banjir.evaluate(X_scaled, y, verbose=0)
# dataframe untuk evaluasi model
df_eval = pd.DataFrame(index=['simulasi'],
columns=['loss', 'accuracy'])
df_eval.loc['simulasi', 'loss']=simulasi_scores[0]
df_eval.loc['simulasi', 'accuracy']=simulasi_scores[1]
df_eval
| loss | accuracy | |
|---|---|---|
| simulasi | 0.023179 | 0.998125 |
df=data_simulasi.copy()
df['cloudcover_3h'] = df['cloudcover'].shift(18)
df['humidity_18h'] = df['humidity'].shift(108)
df['height_diff_18h'] = df['height'] - df['height'].shift(108)
df = df.dropna().reset_index(drop = True)
# Membagi fitur/prediktor dan label
X = df[['height','windgust','cloudcover_3h','humidity_18h','height_diff_18h']]
y = df[['height']]
# X
scaler_X = scaler_X_prediksi
X_scaled = scaler_X.transform(X)
# y
scaler_y = scaler_y_prediksi
y_scaled = scaler_y.transform(y)
# reshape
y_scaled = y_scaled.reshape(-1)
# fungsi window
def create_window(data, window_size, future_size, label):
X_window = []
y_window = []
for i in range(len(data) - window_size - future_size):
X_window.append(data[i:i+window_size])
y_window.append(label[i+window_size:i+window_size+future_size])
return np.array(X_window), np.array(y_window)
# window (3)
window_size_3 = 288
future_size_3 = 36
X_window_3, y_window_3 = create_window(X_scaled, window_size_3, future_size_3, y_scaled)
print(X_window_3.shape, y_window_3.shape)
(2768, 288, 5) (2768, 36)
from sklearn.metrics import mean_absolute_error, mean_squared_error
def eval(model, X_window, y_window):
y_pred = model.predict(X_window)
print("Hasil evaluasi:")
print("MAE on simulation data: ",mean_absolute_error(y_window, y_pred).round(5))
print("MSE on simulation data: ", mean_squared_error(y_window, y_pred).round(5))
# menampilkan hasil evaluasi model
eval(model_prediksi_banjir, X_window_3, y_window_3)
87/87 [==============================] - 5s 60ms/step Hasil evaluasi: MAE on simulation data: 0.0332 MSE on simulation data: 0.00302
# read data and format date
data_simulasi = pd.read_csv('dataset/data_simulasi_banjir_sorted.csv')
data_simulasi['date'] = data_simulasi['date'] + ':00'
data_simulasi['date'] = pd.to_datetime(data_simulasi['date'], format='%d/%m/%Y %H:%M:%S')
# rename columns
data_simulasi = data_simulasi.rename(columns = {'date':'datetime',
'height':'height (cm)',
'temp':'temp (C)',
'feelslike':'feelslike (C)',
'dew':'dew (C)',
'humidity':'humidity (%)',
'precip':'precip (mm)',
'precipprob':'precipprob (%)',
'windgust':'windgust (kph)',
'windspeed':'windspeed (kph)',
'winddir':'winddir (degree)',
'sealevelpressure':'sealevelpressure (mbar)',
'cloudcover':'cloudcover (%)',
'visibility':'visibility (km)',
'solarradiation':'solarradiation (W/m2)'})
data_simulasi.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3200 entries, 0 to 3199 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 datetime 3200 non-null datetime64[ns] 1 height (cm) 3200 non-null float64 2 temp (C) 3200 non-null float64 3 feelslike (C) 3200 non-null float64 4 dew (C) 3200 non-null float64 5 humidity (%) 3200 non-null float64 6 precip (mm) 3200 non-null float64 7 precipprob (%) 3200 non-null int64 8 windgust (kph) 3200 non-null float64 9 windspeed (kph) 3200 non-null float64 10 winddir (degree) 3200 non-null float64 11 sealevelpressure (mbar) 3200 non-null int64 12 cloudcover (%) 3200 non-null float64 13 visibility (km) 3200 non-null float64 14 solarradiation (W/m2) 3200 non-null int64 15 uvindex 3200 non-null int64 16 severerisk 3200 non-null int64 dtypes: datetime64[ns](1), float64(11), int64(5) memory usage: 425.1 KB
import pickle
# Menyimpan objek menggunakan pickle
with open("data_simulasi_banjir.pkl", 'wb') as file:
pickle.dump(data_simulasi, file)
# Membuka file menggunakan pickle
with open("data_simulasi_banjir.pkl", 'rb') as file:
loaded_data = pickle.load(file)
loaded_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3200 entries, 0 to 3199 Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 datetime 3200 non-null datetime64[ns] 1 height (cm) 3200 non-null float64 2 temp (C) 3200 non-null float64 3 feelslike (C) 3200 non-null float64 4 dew (C) 3200 non-null float64 5 humidity (%) 3200 non-null float64 6 precip (mm) 3200 non-null float64 7 precipprob (%) 3200 non-null int64 8 windgust (kph) 3200 non-null float64 9 windspeed (kph) 3200 non-null float64 10 winddir (degree) 3200 non-null float64 11 sealevelpressure (mbar) 3200 non-null int64 12 cloudcover (%) 3200 non-null float64 13 visibility (km) 3200 non-null float64 14 solarradiation (W/m2) 3200 non-null int64 15 uvindex 3200 non-null int64 16 severerisk 3200 non-null int64 dtypes: datetime64[ns](1), float64(11), int64(5) memory usage: 425.1 KB